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EDITORIAL 


The seven lectures comprising the chapters of this book 
formed the programme for the 1966 Easter Conference in 
Mathematics at Bedford College. London. These conferences 
are given annually and the 1965 lectures were published in 
Exploring University Mathematics, vol. 1 (Pergamon Press). 
In the Foreword to this. Professor Eggleston says: “The 
lectures are primarily designed for students about to embark 

on a degree course of which mathematics is a major part- 

Although those attending are drawn from schools in all parts of 
the country, the number involved each year is, unfortunately, 
very limited.... The organizers of the conferences felt that 
these lectures, given by professional mathematicians on 
subjects of current mathematical interest and yet assuming 
little mathematical background, would be of interest to a 
wider public. It was therefore decided to publish them in a 
book and so increase the ‘audience’ many times.” 

The scope of the lectures is fairly wide and is divided 
between pure mathematics and applied mathematics, with a 
natural bias towards the former at this level. Each lecture is 
quite independent, so that getting “lost” in one lecture does 
not mean that a subsequent lecture is unintelligible. This, of 
course, is less important in the book, as the reader has time to 
take each lecture as slowly as necessary for complete com¬ 
prehension. Wherever possible, a list of suggestions for further 
reading is given. 

Five of the lectures were given by members of the Mathe¬ 
matics Departments of Colleges in the University of London, 

vii 
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and for the other two we were very pleased to welcome staff 
from the Department of Applied Mathematics of the University 
of Sheffield. Each lecturer chose a subject in which he is an 
expert, either as a teacher or as a research worker, and the 
titles of some of the lectures belie the serious nature of the 
mathematics involved. 

Dr. Cohn gave an analytic proof of the essentially geo¬ 
metrical Isoperimetric Problem —to find the closed curve of 
given arc length which encloses the greatest area. This proof 
depended on the use of Fourier series which he introduced in 
the first part of the lecture. Although this is quite independent 
of the lecture given by Professor Eggleston in 1965,t the reader 
will certainly find it a very stimulating exercise to compare the 
two methods of solution. 

Professor Kendall's main field of research is magnetoplasma 
dynamics —a wide field ranging from laboratory and engineer¬ 
ing problems to astrophysical and geophysical applications. 
He has also spent some time in Alaska and Colorado working 
on the problem of the formation of noctilucent clouds with 
Professor S. Chapman, f.r.s., and they have recently pub¬ 
lished a joint paper on this subject. In his lecture he explained, 
using elementary methods, the theory which they put forward. 
(The lecture, in condensed form, formed part of an inaugural 
lecture given at the University of Sheffield in April 1966, under 
the title “Pictorial Astrophysical and Geophysical Theories".) 

The title of Dr. Fishel’s talk sounds rather intriguing. The 
mathematical concept which he introduced is fundamental 
to the treatment of number by modern methods. Starting with 
the positive integers as the basis of common knowledge 
he showed how the negative integers are needed in order to 
solve problems posed in terms of the positive integers, and 
how rational numbers are needed to solve problems posed in 
terms of the positive and negative integers. He obtained 

* Exploring University Mathematics, vol. 1, edited by N. J. Hardiman, 
Pergamon Press. 
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IX 


definitions in a natural “experimental" fashion, and ended by 
proving the “infamous" proposition (—1) x (—1) =+l. 

Professor Kilmister gave a carefully reasoned lecture on the 
Theory of Special Relativity. Relativity (special and general) 
can, without exaggeration, be claimed to be one of the two 
great revolutions in physical thought that have occurred in 
the twentieth century —the other being quantum mechanics. 
In the limited space of this one lecture he developed some 
interesting and surprising results which the reader will certainly 
want to explore further. 

When Mr. Kestelman suggested the title Wallpaper Patterns , 
he added: “I think that will give me a chance to touch on some 
stimulating aspects of modem mathematics." This certainly 
proved to be true for those who heard him lecture, as I gather 
some of the students spent much time afterwards trying to 
prove that there were more (or less) than seventeen possible 
patterns. I am sure the reader will also find that this chapter 
provides many hours of interesting and stimulating thought. 

Dr. Burley’s lecture aimed to show that simple models can 
be constructed which lead to sixth-form mathematics and 
demonstrate important physical ideas in kinetic theory. As an 
example an analysis was made of a film which Dr. Burley 
included in his lecture and which unfortunately cannot be 
reproduced in this book as it requires continuous motion. He 
showed an ink blob spreading to form a homogeneous mixture 
with the liquid into which it was dropped; then the ink blob and 
liquid returning to its original state! 

Professor Griffith's lecture on Differential Equations was 
given to an invited audience of the more advanced pupils 
(i.e. those who had already passed A-level), the school 
teachers and the university lecturers. He explained a rather 
remarkable graphical method for establishing the general 
properties of the solution of a differential equation and applied 
it to the well-known equation for the motion of a simple 
pendulum when the amplitude of the oscillation is not small. 
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I should like to thank all those who have taken part in the 
writing and proof-reading of this book, especially Dr. Sargent, 
Dr. Rae and Miss Bratley, and the Pergamon Press for the 
care which they have given to the production of the book. 
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CHAPTER I 


FOURIER SERIES 
AND THE 

ISOPERIMETRIC PROBLEM 

J. H. E. Cohn 


Introduction 

The isoperimetric problem is to find the closed curve of 
given arc length which encloses the greatest area. It was dis¬ 
cussed at last year’s conference by Professor H. G. Eggleston.t 
As was pointed out on that occasion, this problem is of greater 
depth than might appear at first sight. It is by no means trivial 
that there exists a solution at all —it might be that no matter 
what curve were suggested another could be found with the 
same arc length and greater area Such is the situation for the 
problem of finding the largest real number less than unity— 
there is no solution since if x < 1, then 1) exceeds x but 
is still less than unity. Furthermore, it is not clear that even 
if there is a “best” curve that it is in any way unique; there 
might be several quite differently shaped curves with the same 
arc length and the same “best” area. 

It was “known” to the Greeks that the circle is the solution 
to the problem. However, a satisfactory proof was not given 


+ See Exploring University Mathematics, vol. I. chapter 7. 
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until modem times. The solution we shall give is due to Hurwitz, 
and was published in 1902. Its interest is largely in that it 
provides an entirely analytical proof to what appears to be a 
purely geometrical problem. This is very much in line with 
present trends in mathematics. 


Fourier series 

Before discussing Fourier series we shall need the following: 
Lemma. Let m. n he positive integers. Then 


«£7T 

J sin md df) = 0, 

o 

2ir 

J cos mO d0 = 0, 

o 

2 TT 

J sin nO cos mO dO = 0, 

2 it fO if m 4= n 

J cos mO cos nO dO = 


wm 

J sin mO sin nO dO = 


Ur if m = /?, 
fO if m4=/i 

I it if m = n. 


Proof. We have 


2n 2 

J sin mO dO = | — ^ cos mO j 


= ~( 1 — cos 2tmr) = 0. 

2 TT 


f cos mO dO = f — sin mO 

J J 


=—sin2/M7r=0, 
m 


( 1 ) 

( 2 ) 

(3) 

(4) 

(5) 
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2 IT 2TT 

J sin nO cos rn0d0=ij [sin (m + n)0 — sin (m — n)0j dO 


2?r 2 tt 

= i J sin ( m + n)0d0—i j sin(m — n)0d0. 
0 0 


Now the first integral is zero by equation (1) and if m > n so 
is the second. If m = n , the second integral is clearly zero. If 
w < n. then sin (m — n)0 = — sin (/? — m) 0 and again the second 
integral is zero, by equation (1 ). 

To prove equations (4) and (5) we observe that 

2tt 2it 27T 

J cos mO cos nOdO — j sin mds\nn0d0 = j cos (m + n)0d0 


and 


= 0 by equation (2), 


2IT 277” 27T 

J cos mO cos nO d0+ j sin mO sin n0d0= j cos | m — n\0 dO 
0 0 0 

and this equals 27 t if m = n and by equation (2) is zero if m 4= n. 
This concludes the proof. 

One of the central problems in the theory of Fourier series 
is to enquire for what given functions/!#) it is possible to find 
a series 


2 { a » cos H0+ b„ sin A#} (6) 

»-i 1 J 

which is convergent and whose sum equals/!#) for all, or 
nearly all, values of #. At first sight it might seem surprising 
that there should be any such functions, but there are some, 
for example 

^+2 cos 97#-Fsin 308#. 

In the first place, it is clear that each term in the series (6) is 
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unaffected by increasing d by 2tt. Thus if the series does have 
the required properties, we must have 

/(0+2ir) -/(«), (7) 

i.e./(0) must be periodic with period 2ir; this is a necessary 
condition. In fact, it turns out that this condition is almost, 
but not quite, sufficient. 

Suppose for the moment that/(0) does equal the sum of the 
convergent series (6), and that we may integrate all the series 
that occur “term-by-term”.t Then we would have, if m is any 
positive integer, 

■in 2v = 

J fid) cos ind dO = J cos inddd+ 2 { a n J cos nd cos md dd 


■ b„ J sin nd cos md dd } 


= va m , by equations (2), (3) and (4) 
since all the other terms vanish. Thus 

2ir 




cos mO dO 


( 8 ) 


provided that m > 0. If m — 0 we have 


•27r 


2t r 


27T 


jf(d)dd = f ia 0 dd+z {a./ cos nO d6 + b n J sin nO dO } 
o o n=1 0 0 

= a 0 7T 

t There are, of course, series for which this is not true; e.g. if 

U n ( X) = rtJt" -1 — (/i + 1 ).t", 

2 Unix) = 1 — (AT-4- l)jr v 


then 
and so 
Thus 


2 «»(*) = 1 for 0 < x < I. 

/ 2 w„U) dx= 1, 

£ 1 

x I x I 

2 / u*(x)<L r = 2 [at" — =0. 

* 0 * A 


but 
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and so equation (8) is true for m = 0 too.t In exactly the same 
way we obtain 

2tt 

b m = ^ J f(0) sin dd. (9) 

0 

Now. of course, the formulae (8) and (9) have not been 
proved in the sense that the series (6) can be said to converge to 
the sum /(0). All we have shown is that if a Fourier series 
expansion of f(d) does exist, the coefficients a„„ h m are quite 
likely to satisfy equations (8) and (9). As it happens, it is most 
profitable to define the numbers a,„ and b,„ by equations (8) and 
(9). This can be done for any integrable function f(d). We then 
consider the behaviour of 

f id) =£a« + 2 { a„ cos nd + b„ sin /id} 

* v n=l v 3 

in the limit as N —> ». Very often it is possible to show that 
f s (0) does tend to a limit, and to f(0) for most values of 0. To 
prove a theorem of this type is of course quite difficult, and I 
cannot prove such a result here. I shall, however, need the 
following result later: 

Theorem.// for 0 ^ 0 < 2n,f(0)is of bounded variation , 
is continuous except at a finite number of points and satisfies 
equation (7), then 

(a) the series (6) converges for each 0 , 

(b) the sum of the series (6) equals f(0) except at the points 
of discontinuity, 

(c) “term-by-term" integration is valid. 

Two points in this theorem may require some explanation. In 
the first place, what is the sum of the series at the points of 
discontinuity? In fact it always equals the average of the 
two values to which f(0) tends as 0 tends to the point of 

t This explains why we take the first term in equation (6) to be £</„ instead 
of u 0 . 
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discontinuity respectively from the left and right. Thus take 
the case of the function illustrated in Fig. 1.1, given by 

\e ifo < e < ir 

J 1 [ITT— 0 ) (0—7r) if 7 T 0 < 277 . 

In this case it is seen that/(0) -> tt as 0 tends to tt from the 
left, and/(0) — ► 0 as 0 tends to tt from the right. It turns out 
that the sum of the series equals ^tt when 0 = n. 



Secondly, the theorem requires /(0) to be of “bounded 
variation”. The variation is roughly the amount the func¬ 
tion varies over the range considered irrespective of the sign 
of the variation; if this is finite, the function is said to be of 
bounded variation. Thus the above function is of bounded 
variation, the variation being 

7T+ 7T + fol 2 H" ^TT 2 — 2n 4-^77^, 

the sum of the differences of the ordinates at consecutive 
maxima and minima. Most ordinary functions are of bounded 
variation. However, the function illustrated in Fig. 1.2, 
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f{e) =sin (2^=e)’ °^ 0<27r « 

is not of bounded variation. 

Now consider a continuous function f(0), defined for 
0 « 0 < 2tt which is differentiable, and whose derivative 
f'(0) exists except at a finite number of points, and such that 
both f(8) and/' (0) possess Fourier series. Then we may write 

f(0) = £cr 0 + 2 cos n6+b„ sin n0 ) (10) 

n=l 

and 


f'(d)= 2 (a M cos az0 + /3 w sin /i0). (11) 

m=i 

Now if we could differentiate equation (10) term-by-term we 
would obtain 

/' (0) = X j— na„ sin /i0+ nb„ cos / 70 J. (12) 

The question now arises whether this is correct, i.e. whether 
the series (11) and (12) are identical. If so we should have to 
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have a„ = 0; a„, = mb,,,: /3,„ = This is in fact the case, 

for we have by equation (9), applied to the function/' (6), 

277 

7 T/3= J/' ( 0) sin me eld 
0 

277 277 

= [/(0) sin m0] —mj /( 0) cos mOdO 
» 0 

= —mira„, by equation (8), 


and similarly for a„„ using equation (7). 

Finally, suppose that we have two functions expressed as 
Fourier series,/(0) in the form (10) and 


F(0) = i/4 0 +2 (/4„cos/i0+B„sinn0). (13) 

ft=l 

Then using equations (8), (9) and (13) we obtain 

2 it 2ir “ r *' 

Jm F(6) do = lAo f.m do + 2 [a„ Jf(0) cos nod0 

277 

+ B„ j fid) sin nOde] 

0 

= ir[ia 0 A 0 + f) +b„B„)]. 


(14) 


In particular, taking F(0) = /(0), we obtain 


** oc 

/ \f(e)}*d0 = n{w+ 2 («.. 2 +*i. 2 )}- 


( 15 ) 


The isoperimetric problem 

As we indicated earlier, the problem is to determine the 
closed curve of given arc length, /, with the greatest area, A. 
By calculation we obtain: 


FOURIER SERIES AND ISOPERIMETRIC PROBLEM 


9 


Curve 

Equiangular triangle 
Square 

Regular hexagon 
Regular octagon 
Circle 


A\~ 2 

V 3/36 = 0-0481 
1/16 = 0-0625 
V 3/24 = 0-0722 
*(1 + V2) = 0-0754 
1 / 4-77 = 0-0796 


However, beyond suggesting that the circle is the “best” 
answer, tables of this type have little value. By a slightly more 
sophisticated approach we can prove that of all polygons with 
n sides, the regular polygon is the one with Al~ 2 as large as 
possible, and also that for the circle this quantity is greater 
than for any regular polygon. This reasoning shows that the 
circle is better than any polygon. It is clear, though, that 
arguments of this type will not solve our problem. What we 
need is to show that for any curve I 2 — 4itA 5* 0, with equality 
only for the circle. 

Consider then all curves of length / with the properties: 

(a) that they are continuous closed curves; 

(b) that they are sectionally smooth, i.e. apart from a finite 
number of points (corners) they have a continuously 
turning tangent. 

The restrictions imposed here are not really significant, 
although they may appear so. Measure the arc length, s, in a 
fixed direction from any fixed point on the curve. Then the 
coordinates of the general point on the curve can be expressed 
parametrically in terms of s, and since the curve is closed and 
has arc length /, both x and y are periodic functions of s with 
period /. Throughout our discussion of Fourier series we were 
concerned with functions of period 27r, we therefore find it 
convenient to make the substitution 
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Then both x = x(0) and y = y(0) are periodic functions of 9 
with period 277, and so by our theorem we have 

*(0)=ia«+2 (a„ cos n9+b„ sin nO), (17) 

m=i 

y(0)=£c o +2 (c„ cos n6+d„ sin nd), (18) 

»«1 

^ ^ n (— a„ sin n0 + b„ cos nO ), (19) 

W=1 

% = ^J n ^~ c « sin n0 + d» cos «0). (20) 


where equality in equations (17) and (18) is for all 9 , and in 
equations (19) and (20) is for all 9 with a finite number of 
exceptions. Then using equation (15) we have 


Thus 



n 2 (a 2 + b„ 2 ). 


iHcf+dS). 


it ^ n 2 (a,? + b 2 + c, 2 + d,r) 

n= 1 




by equation (16) 
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i.e. / 2 = 2 tt 2 2 n*(a*+b* + c 2 +d 2 ). (21) 

»l*l 

Also 

A = j x dv. the integral being taken round the curve, 

= ] X % de 
0 

x 

= 7T 2 n(a„d„ — b„c u ) using equations (14), (17) and (20). 

«=i 

Thus. 

I 2 — 4jtA = 2^ 2 {n 2 (a 2 +b 2 +c 2 +d 2 )-2n(a n d H -b»c H )} 

n=l ^ 

= 2-n 2 2 {(n<i»-d„) 2 + ( nb„ + c H )* 

n=l v 

+ (/I 2 -l)(c„ 2 + J N 2 )}. 

Since every term is non-negative we have /“— AttA 2* 0 with 
equality if and only if for all n 2 * 1, 

na„ — d„ = 0-, nb„ + c„ = 0; (n 2 — \)(c„ 2 + d„ 2 ) = 0. 

Thus we must have 

d„ = na„ if n & 1 
c„ = -nb H if n ^ l 
c„ = d„ — 0 if 2* 2, 

i.e. a„ — b„ = c„ = c/„ = 0 if n » 2; d } = and 6, = —c,. 
Thus / 2 — 4ir/4 3= 0, with equality if and only if 


x = + cos 9+b, sin 0 

y = ic 0 —b t cos 0 + a, sin 0. 
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i.e. if and only if 

(*-ia 0 )*+(y-ko)* = a. 2 +*.*» 

that is, if and only if the curve is a circle. 

Thus for the circle Al~ 2 = 1 /4tt and for all other curves 
Al~- < 1/477-. This concludes the proof. 

REFERENCE FOR FURTHER READING 
Maxwell. E. A., An Analytical Calculus, vol. 4. chapter xxviii, pp. 186— 
215. Cambridge. 1957. 


CHAPTER 2 


THE MATHEMATICS OF 
NIGHT SHINING CLOUDS 

P. C. Kendall 


1. The earth’s atmosphere at a height of 80-100 km 
(50-60 miles) 

The earth’s atmosphere retains its ground-level composition 
up to a height of about 100 km (60 miles, say) where it begins 
to change. This is an interesting level in the atmosphere 
where many things are happening. However, before describing 
them, we should point out that many people are interested in 
the atmosphere. From a practical viewpoint it is essential for 
aerodynamical engineers and space-craft designers to know 
the properties of the atmosphere, even at great height. For 
example, missile or satellite nose cones begin to melt at great 
height on re-entry into the earth’s atmosphere. Other interested 
scientists are geophysicists, meteorologists, and chemists and 
physicists interested in collision phenomena. 

The main atmospheric properties are its pressure, density, 
temperature, composition and movement. In today’s lecture I 
shall be chiefly concerned with the temperature. Aeroplane, 
rocket and balloon measurements give a height-temperature 
graph of the type shown in Fig. 2.1. Note that when dealing 
with the atmosphere it is conventional to turn the graph on its 

13 
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side. The vertical axis thus measures vertical height above 
sea-level. It is seen that the temperature at ground-level is a 
decreasing function of height. As the height above the ground 
increases the temperature reaches first a local minimum, then a 
local maximum, followed by a second minimum. The tem¬ 
perature then increases upwards, rising rapidly to a high level. 



Temperature, °K 

Fig. 2.1. A sketch showing the behaviour of atmospheric tempera¬ 
ture 7' as a function of height above the earth’s surface. 


The problems here discussed concern the behaviour of the 
atmosphere and atmospheric dust particles in the neighbour¬ 
hood of this second temperature minimum, known as the 
mesopause. The region just below the mesopause is known as 
the mesosphere, and the region just above is known as the 
thermosphere. 

The mesopause is always SO km (50 miles, say) above the 
ground and is cooler in summer than in winter. Noctilucent 
clouds, described in the next section, always occur at the exact 
level of the mesopause. 










Fig. 2.2. Noctilucent clouds seen from Fairbanks, Alaska, on 6/7 August 1963. 

Photograph by Dr. II. T. Fogle , Geophysical Institute. 
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2 . Night shining clouds 

Noctilucent clouds (NLC), or night shining clouds, occur in 
summer above latitudes of 45°N. The clouds lie 50 miles above 
the earth’s surface at the mesopause and have a wispy wave¬ 
like structure (Fig. 2.2). They are visible only by back lighting 
(scattered light) and so may be seen only just after the sun is 
set, or just before dawn (Fig. 2.3). Of course, in some latitudes 



Fig. 2.3. Showing how an observer would see noctilucent clouds by 
scattered light. 

the sun may be only just below the horizon for the whole of the 
night. 

The clouds are so tenuous that stars can be seen clearly 
through them; in fact the clouds are composed of very fine 
particles indeed (of at most 10 _5 cm in diameter). In Sweden 
scientists launched two rockets which exposed sticky surfaces 
at 80 km. They found a thousandfold increase in dust con¬ 
centration within NLC. Some of the dust particles appeared 
to be coated with ice. 
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Probably the ice coating is connected with the low tem¬ 
perature at the mesopause; and the fact that NLC appear only 
in summer suggests that their basic cause is the extra low 
summer temperature at the mesopause. The main problem of 
NLC is, therefore, why a low temperature should cause NLC. 
I shall try to show how such a low temperature could cause an 
increase in the dust concentration with a ledge at the meso¬ 
pause, and simultaneously increase the amount of water 
vapour present.t First, though, some basic thoughts. 


3. Stability 

For our purposes a system is said to be stable if, when it is 
subjected to a small displacement from equilibrium, it tends to 
return to its equilibrium state. If the system tends to depart 


Ball 



Unstable equilibrium 



Stable equilibrium 


Ball 

Neutral equilibrium 

Fig. 2.4. Different types of mechanical equilibrium illustrated by 
placing a ball at the top of a hill, in a valley, and on a flat plane. 

even further from its equilibrium state, after a small displace¬ 
ment from equilibrium, the system is said to be unstable . 
Otherwise, the system is said to be neutrally stable . 

t Chapman and Kendall. Quarterly Journal of Royal Meteorological 
Society , vol. 81(1965), p. 115. 
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Examples range from simple ones, such as that of a ball 
placed at the top of a hill (unstable), in the bottom of a valley 
(stable) or on a horizontal plane (neutrally stable), as in Fig. 
2.4; to the problem of a “heavy” fluid resting in equilibrium 
on a less dense “lighter” fluid [Fig. 2.5(a)]. In the latter problem 
we can see intuitively that the “lighter” fluid must rise by 

Equilibrium level 


Equilibrium level 


Equilibrium level 


Fig. 2.5. (a) The equilibrium configuration when a dense liquid lies 
on top of a less dense liquid: (b) a small displacement of the system 
from equilibrium; (c) growth of such a small displacement. 

Archimedes’ principle. However, this can only happen after 
a small displacement, such as that shown in Fig. 2.5(b), has 
occurred. Then the upper parts of the “lighter” fluid will be 
buoyant and will rise, the lower parts of the “heavier” fluid 
will sink, and the displacement from equilibrium will grow as 
in Fig. 2.5(c). 


"Heavy'' fluid 



4. Thermal instability 

(i) Consider an atmosphere in which the temperature 
decreases rapidly upwards. Suppose that a fluid element is 
displaced upwards. It will expand so as to equalize pressures, 
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but then, under some conditions, its density will be lower than 
its surroundings. So by Archimedes* principle the displace¬ 
ment from equilibrium will continue. It turns outt that the 
condition for the fluid element to rise is that the temperature 
should decrease rapidly upwards. (The layer of the atmosphere 
above the fluid element is “cooler than it ought to be”, and is 
hence “heavier than it ought to be*'!) Likewise, under these 
conditions, a fluid element displaced downwards would sink 
even further. 

(ii) If the temperature increases rapidly upwards, a displaced 
fluid element tends to return to its original position more 
rapidly than when the temperature is constant. 

Thus in (i) we say that the atmosphere is unstable because a 
fluid element displaced from equilibrium tends to move even 
further from equilibrium. In (ii) we say that the atmosphere is 
stable. 

Height 


90km 

80km 

70km 



Temperature 


Fig. 2.6. The suggested alteration in temperature near the mesopause 
during a display of N LC. 

Conditions near the mesopause. It follows that near the 
mesopause (if the local temperature drop were low enough) 

t A less superficial explanation can he found in the Encyclopaedic Dic¬ 
tionary of Physics (ed. J. Thewlis). Pergamon, 1961. under the heading 
“Instability. Static”, p. 844. vol. 3. 
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the atmosphere would be stable above 80 km and unstable 
below 80 km. We consider this condition to be the unusual 
one which causes NLC displays. Figure 2.6 shows this 
hypothetical, unusual state diagrammatically. 


5. The turbopause 

The line separating the stable and unstable regions of the 
atmosphere is called the turbopause. Below the turbopause 
the atmosphere is unstable and will be constantly overturning. 
It is therefore in a state of turbulence, i.e. well mixed. The 
difference between the atmosphere above and below the turbo¬ 
pause is the difference between a perfectly still fluid and a 
fluid being stirred vigorously. Rockets have released visible 
chemicals at these levels, and under normal conditions place 
the turbopause at the 100 km level as in Fig. 2.7. 


Height 


100km 


Still air 

turbopause 



90km 


80km 


mixed 

air 


Fig. 2.7. The level of the turbopause under normal conditions. 


We shall see that if the turbopause were to descend to the 
mesopause a display of NLC might be caused. Such co¬ 
incidence of the turbopause and mesopause might be the result 
of the very low summer mesopause temperatures referred to 
earlier. The extra low temperature would enhance instability 
below the mesopause and also stabilize the region above the 
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mesopause. The clouds would then appear as a ridge of 
meteoric dust formed in the still air above the mesopause. 
Mathematically the theory runs as follows: 


6 . Fall of meteoric dust (still atmosphere) 

Terminal velocity . If a small particle of meteoric dust of 
mass m falls under uniform gravity g through a uniform atmo¬ 
sphere, the resistance to motion is proportional to the down¬ 
wards velocity v and may be written as kv , where k (>0) is 
constant. The downwards equation of motion is thus 

mg—kv. (1) 

To integrate, write v — u + mg/k. Then equation (1) becomes 

du . 
nr di = ~ ku ' 


so that 



Integrating we obtain 

log u = C — kt/m, 


where C is a constant of integration. Taking exponentials 
and putting «„ = e c gives 

u — u„e~ k,lm . 


It follows that lim u = 0 and so 

t— * 

lim v = mg/k. 

t — ec 
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So no matter what speed a meteoric dust particle starts with, 
its speed always tends towards mg/k. This downwards velocity 
is called the terminal velocity. 

Time to achieve terminal velocity. Unless the particle 
actually starts with the terminal velocity it never actually 
reaches it. However, it does approach it very closely. The 
quantity u 0 e~ k " m is a measure of how close. Thus, the velocity 
relative to the terminal velocity decreases by a factor e~ l 
(e = 2-718 ...) in a time T = m/k. This is known as the charac¬ 
teristic time. At the 80-km level the theory of gases gives the 
terminal velocity to be 18 cm/sec. Thus mg/k = 18, giving the 
characteristic time to be 

x 18 Lk 
T = — = TTrth sec. 
g 50 

A particle of meteoric dust therefore reaches its terminal 
velocity in a fraction of a second. As the dust may have been 
falling for hours we might as well ignore the equation of motion 
and assume that small particles of meteoric dust always fall 
with the terminal velocity. 

Variation of terminal velocity with height. We have assumed 
in the foregoing analysis that the atmosphere through which 
the dust falls is uniform. In fact, because air is compressible, 
the density in an atmosphere at uniform temperature varies 
with height z exponentially. If N is the air density 

N = N 0 e~* IH , 

where N 0 is the density at height z = 0 and H is a constant 
called the scale height. Near the mesopause H = 4-5 km, 
implying that the air density changes by a factor e~' over a 
vertical height of about 3 miles. The resistance to motion 
caused by collision with air molecules is inversely proportional 
to the air density. So that if H(z) denotes the terminal velocity 


U(z) = U 0 e«“, 


( 2 ) 
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where U„ is the terminal velocity at height z = 0: 

U n = 18 cm/sec. 


Time to Jail from infinity. By integrating the equation 


f -w 


(which is left as an exercise for the reader) we find that 
the particle will fall from z = 30 to z = 0 in the finite time 
H/U 0 = 4-5 x lO 5 /^ = 2-5 x 10 4 sec; that is about 8 hours. 
Thus, dust particles would follow a descending turbopause in a 
time of about half a day. 

Density variation. The meteoric dust is supposed to be falling 
from interplanetary space at a constant rate of /g/cm 2 per 
second. Thus, if n is the dust density (in g/cm 3 ) at any level and 
v is the dust velocity at that level, in the steady state, 


nv = /. 


But we know that the speed of fall is always close to the 
terminal velocity U(z) given by (2). Thus, at any level z, the 
density of dust is 


n = (flU 0 )e-*'". (3) 

This shows that the density of dust increases downwards in 
the high atmosphere to the turbopause, where the assumptions 
which we have made break down. 


7. The dust density below the turbopause 
Below the turbopause the atmosphere is well mixed. As we 
do not know the rate of mixing we assume that it is instanta¬ 
neous. If dust were unable to stick to the earth’s surface (100 
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km below the turbopause) the dust density at time t, assuming 
no dust in the lower atmosphere initially, would be 

Slower atmos ^la tfl 80 km g/cm 3 . (4) 

But just above the turbopause, at height z, equation (3) gives 
the value of n. Thus 


n _ 80 km /H 

/Ila U 0 t 


Converting to centimetres and putting U 0 = 18 cm/sec gives 


n 

«la 


8 x 10« 
18/ e 


-ZIH 


4X 10 5 
—:—e 


-zlH 


The lower atmosphere will therefore “fill up” in a time of 
about 4 x 10 5 sec (about 4 days) unless some way exists of 
removing dust. 

Suppose that in the well-mixed lower atmosphere dust is 
being transported with average speed v (say). Then vn,» g of 
dust will come contact with 1 cm 2 of ground in 1 sec. If all 
these particles stick (as seems likely) the steady state will be 
reached when 

2«ia =f~ nU 0 e* IH . 

Thus 


JL = jl e -ziH 

«la Uo 


(5) 


It follows that if the average speed of transport v greatly 
exceeds the value of U 0 ( 18 cm/sec) there will be a ledge of 
dust formed at the turbopause. 


c 
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8 . The full theory 

Figure 2.8 shows the variation with height of the dust 
density. The two cases when (a) the turbopause is at 100 km. 
(b) the turbopause coincides with the mesopause, at 80 km, 
are compared. Above the turbopause, in the still air, the graph 
has an exponential shape and the density drops olf rapidly with 



Fig. 2.8. The distribution of dust with height in the cases when 
(a) the turbopause lies at 100 km, (b) the turbopause has descended 
to the mesopause (80 km). 

height. Below the turbopause the density has a constant value 
(under the assumptions we have made). Comparing (a) and 
(b) we see that the descent of the turbopause from its usual 
level of 100 km to the 80-km level would cause an increase of 
the dust density at the mesopause. 

Thus an exceptionally low mesopause temperature could 
lead to 

(1) an increase of dust at the mesopause owing to the 
descent of the turbopause; 
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(2) increased mixing below the turbopause which might 
transport water vapour up to the mesopause at an 
increased rate, where it would condense on the dust 
particles. 

It appears that one phenomenon, the abnormally low tempera¬ 
ture, is capable of explaining all the others, and the whole 
sequence of events can be offered as a theory of night shining 
clouds. 

Finally, it should be pointed out that some scientists believe 
that NLC are primarily ice clouds, the dust particles serving 
merely as condensation nuclei. Thus there are other theories 
of NLC, some features of which are not inconsistent with the 
above picture. 













CHAPTER 3 


NUMBERS MADE TO MEASURE 

B. Fishel 

§ 1. If we calculate the length of the diagonal of the square 
of side 1 by using Pythagoras’ theorem we obtain the value 
V2. Now V2 is a length which cannot be measured, for what 
do we do when we measure? We make the beginning and the 
end of the length coincide with two marks on a ruler, count 
the number of intervening marks, and say that the length is 
“p ^ths”. Suppose that 

V2 = plq, 

where plq is in its lowest terms, 
then p 2 = 2cf, 

and so 2 divides p 2 . But 2 must then divide p, for it cannot be 
written as the product of two factors one of which could divide 
one of the p’s, the remaining factor dividing the other p (2 
is “prime”). Since 2 divides p we can write 

p = 2r. 


and so 


2 V = pr = 2(f, 
q* = 2r*. 
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We can now prove, as before, that q — 2s, and so p and q 
have the common factor 2, contrary to our supposition that 
plq is in its lowest terms. 

Pythagoras was the leader of a religious-cum-philosophical 
sect, and the discovery that the diagonal of the unit square 
cannot be measured was such a shock to its members that 
they were sworn to secrecy, and were forbidden to reveal the 
shameful knowledge to the uninitiated. Why was it such a 
shameful matter? The Greeks of that time, the sixth century- 
before Christ, believed that all significant quantitative relations 
in nature could be described in terms of whole numbers (natural 
numbers). They believed, for example, that the radii of the 
spheres, centred on the earth (!), on which the different planets 
moved, were in whole-number ratios to each other, and that so 
were the lengths of the harp strings which sounded harmonious 
chords. (The latter was a fact rather than a belief.) It is worth 
noting that the Greeks’ belief about the radii of planetary 
orbits was still held at the end of the sixteenth century by 
Kepler, one of the founders of modern astronomy. 

Since natural numbers were so important, anything that 
could not be expressed in terms of them was unspeakable—or, 
as we should say nowadays, cannot be defined in terms of the 
natural numbers in a finite number of steps. 

_ I am sure that you have never had any worries of this sort 
about numbers like V2, V3 and it, which you probably call 
“irrationals”. Suppose, however, we open up this sort of 
question. Where shall we start from? Let us take it for granted, 
just as the Greeks did, that there are no difficulties about the 
natural numbers, that we know what they are, that they are 
“natural”. What sort of numbers arise next? The fractions, or 
rational numbers, and then the negative numbers. Instead of 
looking at them in this order let us instead consider the 
negative numbers first. The negative whole numbers arise in 
the following way. We want to solve the equation x + b = a, 
where a and b are natural numbers. If a s* b then x = a — b is 
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a natural number and solves the equation. If a < b there is no 
natural number which solves the equation. We need new 
numbers, and we shall ‘‘make them to measure”. 

§ 2. Whatever sort of animals these new numbers are going to 
be. the one which solves 


in the case 


x + b = a 
b > a 


must obviously depend on a and b. We shall use the symbol 
(a. b). called a “pair”, to denote the solution, whatever it is, 
and try to find the sort of rules that pairs must obey. 


If 


then 


x + b — a. 


x-h h +1 — a +1 
for all /. so that we must have 

(a.b) — (a +1, b +1). 

This can be expressed in the form: 

(a, />) = (c.d) means a+d=h+c , 


for if 


(E) 


clearly 


c = a + 1 and d= b +1 

(i + d=a + b + l=b + a + l = b + c. 


and. conversely, if 


a + d = b + r. 
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we have 


c — a + (d—b)} 


if d > b. 


d = b+ (d—b)) 

so that d — b plays the part of /, and 
a = c+ (h — d) 


b = d+ (b-d) 


if d^b, 


so that we have 


a = c + m 


b = d+ m 


with m — b — d. 
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Note that equality of pairs as defined by (E) has three impor¬ 
tant properties, without which it would not be very useful. 
The first of these asserts that every pair is equal to itself. This 
is not a matter which is so obvious that even to talk of proving 
it is absurd. We must not allow the overtones of the word 
“equality” to make us forget that the meaning of “equality”, 
for pairs, is contained entirely in the definition (E). 

To prove that (a, b) = (a,b) we apply the definition, replacing 
c and d by a and b, and we require 


a + b = b + a. 


and this is true. 

The second of the properties is that 
if ( a,b) — (c\d ) then (c,d) = (a,b). 

This is so because 


if 


a + d=b + c then c + b = d+ a. 
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This property says that the fact of {a,b) being written first 

in the equality 


( a,b) = (c,d) 


is not significant. 

The final property is that 

if (a,h) = (c,d) and (c,d) = (e,f) 

then 

C a,b) = (e,f). 

This is so because 


if 

then 


a + d= b + c and c +f=d+e, 
a + d+c+f= b + c+d+e. 


and so 


a +f= b + e, 

ie - (a.b) = 

These three rules show that we can work with equality of 
pairs in a perfectly “reasonable” fashion. 

How shall we add pairs? 

If 


then 


x + b = a and y + d= c 


x+y+b+d= a+c. 
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| so that if 

then 


x =(a,b) and y=(c,d) 
x + y = (a + c,b + d ). 


We therefore add pairs by means of the formula 

(a, b) + (c, d) = (a + c, b + d). (A) 

What sort of properties has addition got? 

Since 

( c,d ) + (a,b) = (c + a,d+ b) 
and 

c + a = a + c, d+b = b + d 

(a very familiar property of the natural numbers), our definition 
of equality of pairs shows that 


(a,b) + (c,d) = (c,d) + (a,b). 

How do we add three pairs together? We have defined 
addition only for two pairs. We first add two of the pairs, 
and then add their sum to the third. But this can be done in 
two ways, thus: 

{(a, b) + (c, d)}+(e,f) and (a, b) + {(c, d) + (e,f)}. 

We can in fact prove that both ways give the same result. Try 
it for yourself. 

There is a “zero pair”— the pair (0,0) or any pair (a, a), 
since all such pairs equal (0,0)— 


(a,b) + ( 0,0) = (a + 0,6 + 0) = ( a,b ). 
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Before we see how to multiply pairs we must pose the 
following question: we introduced pairs in order to solve 
x + h = a when b > a; we can now pose the equation for X: 

X+(p,q) = (r,s). 

Can it be solved by a pair X ? If not, we are no better off 
than before. 

We require a pair A' = such that 

(x„x 2 ) + (p,q) = ( r,s ), 

•e- (xt + p,x t + q) = (r,s). 


This will be so if 


x, + p+s = x-i + q + r, 
and this equation has the solution 


x t = q + r, x.j, = p + s. 


where x, and x t are, clearly, natural numbers. 

This shows that if we work with pairs we can always solve 
equations like X + B = A, whatever the pairs A and B may be. 
Perhaps therefore we ought to use pairs, rather than natural 
numbers, for arithmetic. However, we are accustomed to single 
symbols like a, or —a (where a is a natural number), and it is 
rather late in mathematical history to think of making such a 
fundamental change. Is there some way in which, while using 
the pairs that we have just introduced, we can represent them 
by symbols like a and —a, a being a natural number? 

We recall that if a 5s b the solution of jr-f b = a is a natural 
number; on the other hand, if we solve the equation by means 
of a pair we obtain ( a,b ). So that perhaps the pairs (a. b). 
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with a ^ b, “behave like the natural numbers”. What does 
that mean? It means that each such pair can be associated 
with a natural number, different pairs get different natural 
numbers, and each natural number gets one such pair. More¬ 
over, when we add such pairs we may just as well add the 
corresponding natural numbers, for the pair that we obtain 
by addition is the one associated with the sum of the corres¬ 
ponding natural numbers. Now this is in fact true, provided 
we remember that pairs like (p, q) and ( p — q. 0) are equal (we 
are. of course, dealing with pairs for which p ^ q). This last 
remark means that we have only to consider the pairs («.0). 
We associate with ( a , 0) the natural number a. Since 

(«,0) + (/>,0) = (a + b, 0), 

and (a + h, 0) is associated with a + b , we see that the pairs 
(a, b ), where a s* b, do indeed behave like the natural numbers. 
(The technical expression is “are isomorphic with”: iso = 
same, morphia = shape.) 

For any pair (a.b) we have either a & b, when the pair 
corresponds to a natural number, or a < b. In the latter case 
the pair equals (0, h — a), a pair of the form (0 .p) (where p 
>0), so that there are two sorts of pairs — this becomes 
interesting, it begins to look as though we are on the trail 
of the “negative natural numbers”, or negative integers as 
they are called. This is pretty well confirmed when we notice 
that 


(a,0) + (0,a) = (a,a) = (0,0), 

since, as (0,0) is the zero pair, this means that (0, a) is a sort 
of “—(</,0)”. In fact the (0,«) behave like (are isomorphic 
with) what you have always called the negative integers — 
you can easily check this. 

We have now reached a point where we can see how to use 
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pairs while at the same time retaining the notation you have 
been accustomed to. Suppose we write (a) for ( a, 0), and —(a) 
for (0. a). (Notice that —( a) is not (^a) because we still do not 
know what —a is. It is certainly not a natural number.) We 
have seen that (a) behaves like a, and if 

x+(b) = (a), 

then 


*+(£) + (-(*)) = (a) + (-(b)), 


so that, since 

(b) + (-(b)) = (M) + (0, b) = (0,0) = (0), 
* = (a) + (-(*)) 

= (a.b) 

_ f ( a — b ) if a b, 

I— ( b — a ) if a < b. 


But this is what you have always done with negative integers: 
x + b = a, 

_ f a — b if a 2* b, 

X ~ {—(6 —a) if a<b. 

If we agree to omit + from +(—(6)) so that we just write —(b), 
we now obtain the solution of 


x+(b) = (a) 


as 


x= (a) - (b). 


NUMBERS MADE TO MEASURE 35 

Finally, why bother about (_)? Why not just write a for the 
pair (a, 0) and —b for the pair (0, b)l We can then solve 


by 


jr +b = a 

x = a — b, 


and you can sigh with relief that everything is just as it was 
before—but remember, a is a slovenly way of writing (a), 
which is an abbreviation for (a, 0), and —b should really be 
—(b), which denotes (0, b)\ 

§3. Now what about multiplication? First of all, if 

x + b = a, (1) 

then 

dx + db = da and xd +bd= ad. 


so that 

d(a,b) = (da,db) = (a,h)d, 

and then, if 

y + d=c, (2) 

multiplying equations (1) and (2) we obtain 

jry+ xd+ by + bd = ac, 
xv + (da, db) + (be, bd) + bd = ac, 

jcy+ (da,dh) + (bc,bd) = (ac,bd), 
xy = (ac, bd) + (db. da) + (bd, be) 

= (ac + db + bd,bd+da + bc) 

= (ac-\-bd, bc+ad). 
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We therefore multiply by means of the rule 

( a.b) X (c, d) = (ac + hd,bc + ad). 

It is easy to verify that 

(a.b) x (c.d) = ( c,d ) x (a.b), 

((a.b) x (c,d)) x ( e,f) = (a.b) x ((c.d) x (*,/)), 

and that 

(a.b) ((c.d) + (e,f)) = (a.b) x (c,d) + (a.b) x (e.f). 

so that multiplication of pairs is a “reasonable” sort of 
multiplication. 

We can now give a proper proof of the relation 

(-i)x(-n-i. 

This is a very important theorem which has necessarily to be 
glossed over, rather, at the stage at which you first came 
across it at school. This is so because negative numbers are 
presented in a descriptive fashion which does not lend itself 
to the rigorous sort of proof that is needed when we reach a 
higher level of sophistication. 

First of all, what is —l? It is —(I), or (0,1). Our rule for 
multiplication then shows that 

(0,1) x (0,1) = (0.0+1.1,1.0 + 0.1) 

= ( 1 , 0 ) 

= ( 1 ) 

= I. 


NUMBERS MADE TO MEASURE 37 

That disposes of the negative integers. How can we define 
the fractions? We have to solve the equation 

x b = a 

when b is not a divisor of a. As before, we work with a pair, 

x = {a.b}. 

But I leave the development of the theory to the reader! (It 
is not very different from that of the negative numbers.) 

§ 4. What remains to be done in order to provide us with all 
the numbers we need for everyday mathematics? We need 
complex numbers in order to be able to solve equations like 

x*—lax +b* = 0 

when b 2 > a 3 , and a, b are rational numbers. The development 
of a theory of suitable numbers is rather more complicated than 
in the two previous cases, although it can still be done by 
means of simple algebra. We also need, as we saw right at the 
beginning of this chapter, numbers to solve 

x 2 = 2, 

and numbers to express the ratio of the circumference of a 
circle to its diameter (n), and the sum (e) of the infinite series 


But that is another, longer, and more difficult task. 
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CHAPTER 4 


SPECIAL RELATIVITY: 

A QUESTION OF TIME RECKONING 

C. W. Kilmister 


This chapter falls into two parts. In the first part 1 want to 
describe a particular approach to special relativity which was 
implicit in Einstein’s original development in 1905, was made 
explicit by Milne in 1938, and has acquired much wider 
currency as a result of Bondi’s recent advocacy. I shall begin 
by making certain assumptions and show how the theory 
follows in a very direct and simple way from these assump¬ 
tions. However, amongst the consequences of these assump¬ 
tions are some which people have found very unpalatable. 
Accordingly in the second part we shall return to the 
assumptions and show that we really had no choice but to 
make them. 

The idea of a single universal time ordering of events is a 
very advanced one. We cannot appeal to observation to justify 
it directly, for there are many instances which appear to con¬ 
tradict it. For example, a distant observer sees a flash of 
lightning long before he hears the thunder, although a nearer 
observer can hardly detect any time interval. We can even 
witness a reversal of order when someone watches soldiers 
drilling from a distance and sees the men suddenly move 
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before he hears the word of command which caused the move¬ 
ment. Of course, if we insist on our idea of a single time order, 
we can explain these apparent anomalies in terms of the speed 
of sound, but the point here is that the original idea is an 
abstraction from experiments, and one of a rather subtle kind. 
Einstein, in 1905, seems to have been the first person to realize 
that the idea of distant events being simultaneous with each 
other was something which needed careful consideration. He 
was able to show at that time that the only rules which one 
could give, for the determination of such simultaneity, gave 
criteria which were subjective, in the sense that different 
observers would form different conclusions about which events 
were simultaneous. 

The question of assigning times to distant events is tied up 
with the way in which one gets information about these events. 
It was known from very early times that sound travelled with a 
certain speed, for this was the cause of echoes. For a long time, 
however, light was thought to travel instantaneously (although 
Empedocles, 490-435 b.c., spoke of light as travelling and was 
criticized by Aristotle for it). In 1638 Galileo described an 
experiment which would suffice to determine the speed of 
light, if such a speed existed. He suggested that two persons 
equipped with lighted lanterns and a shutter should stand 
several miles apart and as soon as one sees the light of the 
other he uncovers his light. The speed of light, however, is so 
great that such an experiment stands very little chance of 
success, and in fact Galileo only tried it at a distance of less 
than a mile. The general principle behind Galileo’s experiment 
was, however, the basis of the first experimental determination 
of the speed-of light. The planet Jupiter has a number of 
satellites, and as a matter of fact the satellites were first 
observed by Galileo. In 1668 Cassini published tables of 
motion of these satellites which were recognized as fairly 
reliable, and the Danish astronomer Roemer studied the 
irregularities in the times of eclipse of these satellites in 1675. 


D 
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He was able to show that these irregularities were caused by 
the light from certain eclipses having to travel a greater dis¬ 
tance to the earth than from others, because of the different 
position of the earth in its orbit round the sun. He calculated 
that the time required for light to traverse the earth’s orbit 
was about 22 minutes; the correct value is about 17 minutes. 
The actual determination of the speed of light by Galileo’s 
method was finally performed in the laboratory by Fizeau. In 
his experiment light passed between the teeth of a rotating 
tooth wheel, travelled several miles, and was then reflected 
back along the same path. The wheel was rotated at such a 
speed that the light reached the wheel again just when a tooth 
was in its way so that no light was received back. The velocity 
determined by this means, which is close to the latest deter¬ 
minations, is about 300,000 kilometres per second. 


The velocity of light 

A completely different development took place in 1873 
when Clerk Maxwell published his electromagnetic theory of 
light. He had formulated equations governing the electro¬ 
magnetic field, and had determined that these equations had 
solutions in the form of waves. In 1887 Hertz generated these 
waves in the laboratory, and of course they are now quite 
familiar to us in the form of radio waves. But Maxwell also 
noticed that light was an instance of such waves, though the 
frequency is much higher than the radio waves which we are 
accustomed to. 

Einstein was, of course, familiar with Maxwell’s theory and 
his actual starting point in 1905 was a very interesting one. He 
observed that there were solutions of Maxwell’s equations 
corresponding to a wave moving with a certain velocity, which 
could be determined entirely by electromagnetic means and 
which agreed closely with the measured velocity of light. 
Consider, he then said, what such a wave seems like to an 
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observer moving along with it with the speed of light. This 
observer would see electric and magnetic waves of the same 
kind as Maxwell had already been able to determine for the 
wave solutions, with the exception that these fields were 
not moving along but were standing. No such solution^ of 
Maxwell’s equations exist, and from this puzzle Einstein was 
led at once to the idea that the speed of light is a rather par¬ 
ticular quantity in physics whose status is quite different from 
that of the speed of sound. This fact was at once connected by 
him with the difficulty we have mentioned before of assigning 
times to distant events. 

This assigning of time can only be done in terms of infor¬ 
mation which we receive about those events, and this in¬ 
formation comes to us principally by means of light or other 
electromagnetic radiations. It is true that we get a certain 
amount of information by means of sound or even by touching 
objects, but all these methods of deriving information convey 
it from place to place with a much smaller speed than that of 
light. Speaking from a practical point of view we know of no 
way of transmitting information more swiftly than that of 
light. This is not to say that higher velocities than that of light 
can never occur; for example, light itself moves in a medium 
whose refractive index is less than 1 with a speed greater than 
its speed in free space. But a careful analysis of the actual 
transmission of information in this case by a packet of waves 
shows that the information is still carried more slowly than the 
velocity of light in free space. 

Before we investigate exactly how to give this special quality 
to the velocity of light, let us see how it pays off. 

Since then we are to have a constant value for the speed of 
light, it will be much more convenient in what follows to adopt 
only one fundamental unit, say that of time, and to define 
lengths in terms of this unit. That is, we adopt one second as 
the unit of time and the distance that light could travel in one 
second as the unit of distance (about 186,000 miles). In these 
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units the speed of light will then always be 1, so that its 

constancy is incorporated from the very beginning. 

Let us now see how two observers O, O' in motion relative 
to each other will assign times and distances to a distant event. 
We shall draw a diagram to represent the motion of the 
observers. In this subject it has become customary to draw the 
time-axis up the page, and I shall continue this custom. We 
will then have room for one space-axis, which we take across 
the page, and relative to these axes a ray of light will be 
represented by a line bisecting the angle between them. 

It is convenient to look at everything from the point of one 
of these observers O, and the diagram for his motion will 
then be a straight line along the time-axis simply representing 
the passage of time as measured by him (Fig. 4.1). The 
observer, O, can now determine the relative motion of the 
observer O'. It is convenient to take the zero of both observers’ 
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time reckonings as their instant of coincidence. O sends a 
signal at time r, which is received and reflected by the second 
observer at time t 2 and returns to the first one at time We 
know that the first observer determines the distance of the 
second one as jt 2 =i(r 3 -/,), and he also assigns to the time 
of reflection the value / 2 = £(/» + '■)• Now the time at which 
the second observer receives the signal from the first, t 2 ', 
depends only on their relative motion and the time when the 
signal was sent. If O sends the signal at time 2/,, these times 
being measured from the instant at which the two observers 
were coincident, it is clear that, if the curve representing the 
second observer is a straight line, he will receive the signal 
from O at time lt 2 '. In general the ratio of t 2 ' to /, is a certain 
constant which is determined by the speed with which the 
observers are separating. We shall call this constant k and we 
then have the relation t 2 = kt x . Since the velocity of light is 
constant for all observers, it follows in exactly the same way 
that the ratio tjt 2 ' of the times of reception and transmission 
of the returning signal is also constant, depending only on 
the velocity of separation of the two observers and therefore 
having the same value as the previous ratio. We can therefore 
express the time and distance of the event (as assigned by 
O) which is the reflection of the light at the second observer, 
in terms of in the following form 

t 2 = i(k 2 + 1)/|, * 2 = i(* 2 -l)'i- 

If we uivide the distance by the time we get an expression 
for vhe velocity v with which the observers are separating and 
so we have 

k *-1 


We notice that v is certainly less than 1, so that the constant 
velocity introduced by the axioms of the last chapter is also 
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a maximum velocity of separation of observers. The constant 
A is the one which it is most convenient to use in what follows 
to specify the relation between the observers, but for historical 
reasons we shall also translate the results into the corres¬ 
ponding ones in terms of the speed of separation because this 
is how they are most usually quoted. 


The Einstein velocity formula 

Let us now go on and consider two other observers O' and 
O". If the corresponding values of A are respectively A and 
A' then the number that the time of emission of a signal from 
O must be multiplied by to get the time of reception at O" 
is evidently kk' (Fig. 4.2). Thus the A’s combine together 



multiplicatively. whereas one would have expected velocities 
to be added together. If we use the expressions for A and A' in 
terms of the velocities we derive at once, for the resultant 
velocity V, 

y _ v, + v 2 
I + v,v 2 ' 
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For the resultant of two velocities v, and v 2 in Newtonian 
Mechanics the corresponding expression would be simply the 
numerator of this. The denominator differs from 1 by a term 
which is a product of the two velocities but, of course, since 
we have made the speed of light unity, this product will be 
very small for all the speeds which we are accustomed to in 
everyday life. If we want to use the original units for space 
and time we should have to divide the product of velocities 
in the denominator by the square of the speed of light so that 
for most terrestrial applications the denominator is effectively 
unity. It is therefore not surprising that Newton was led into 
no trouble by supposing that he could unambiguously assign 
a time to distant events, and so use the sum of two velocities 
as their resultant. The present formula (the so-called Einstein 
velocity formula) has some very direct applications. 



Fig. 4.3. 


One of these is to the experiment of Fresnel, in which light 
is passed through a stream of water moving with speed v and is 
found to have a velocity intermediate between cln (n the 
refractive index), which it would have if the wat$r were at 
rest, and cln + v, which it would have if it were carried along 
by the water. If we consider Fresnel’s experiment according 
to an observer who is moving with the water the velocity of 
light will be cln according to this observer. When we transfer 
to an observer fixed relative to the source of the light, we 
shall have to combine with this speed the speed of the water, 
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and so we get the expression 

c/n -I- v 

nc* 

which is to a sufficient approximation 



the result found experimentally by Fresnel. Although this 
experiment is less accurate than some of those used to test 
the special theory of relativity it has the great advantage of 
being very direct and simple. 

You will be wondering what are the unpleasant characteris¬ 
tics of the theory since so far everything I have described has 
been so successful. I myself do not believe that any un¬ 
pleasant consequences have yet been found, but objection 
has been raised by some people to the following: some years 
ago it was pointed out, not for the first time, that if a space 
traveller leaves his twin brother behind on the earth, goes to 
a great distance at a high speed, and then returns, he will be 
younger than the twin who has stayed at home. There is no 
paradox (although this is often called the clock paradox) since 
it is evident that there is no symmetry between the twins. One 
of them has undergone a large acceleration. We can look at 
it from an economic point of view: it is immensely more 
expensive to send a man out to a distant planet than to provide 
his brother with an armchair in the laboratory. Moreover, the 
two brothers do not give similar descriptions of events. Let 
us imagine they are both billiard players. The stay-at-home 
brother finds that at all times during the experiment he has a 
most satisfactory game. The traveller, when he returns, says 
“most of the time the game went very well, but round about 
the middle there was considerable trouble”. 





i 
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Of course it behoves us in a country with such severe 
economic difficulties to look around for some piece of good 
fortune which will reduce the immense expense of carrying 
out this experiment, and for the present we will consider the 
following form in which the edge of the paradox is slightly 
blunted, but the form has the advantage of being completely 
describable in terms of what we have done already. Suppose 
that an observer O is at rest, as we have considered before, and 
at time / = 0 on his clock and observer O' passes him with a 
speed v. O' moves off to a great distance and while he does 
so O sends signals to him and receives back reflections. At 
a great distance O' has the good fortune to meet another 
observer O" who is moving towards O with exactly the same 
speed as O' is moving away. At the moment when they are 
coincident these two observers check their time reckonings 
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against each other and O" then returns and passes O again as 
shown in Fig. 4.4. The event of the meeting of O' and O" is 
duly observed by O who has sent out a signal to it at time 
t, so that O' observes the time to be kt , and therefore O' 
assigns the total time for the journey as 2 kt t . The light signal 
from the meeting of the astronauts returns to O at time 
t 2 = A 2 /, and from the symmetry of the figure it is obvious 
that O assigns the time f, + / 2 to the journey. Now 1 + A 2 is 
always greater than 2k as can be seen from the fact that 


1+A 2 -2A = (1 —A) 2 > 0. 


Let us suppose that this experiment is a valid idealization of 
the experiment in which the observer O' leaves the observer 
O with a certain velocity, and at a great distance away turns 
rather sharply and comes back with the same velocity. This 
second experiment is obviously not identical with the first, 
since only two observers are concerned, but we are not at this 
point able to describe the way in which the observer turns 
round, since he must accelerate in order to do this and the 
consideration of accelerated observers comes later in this 
chapter. However, if the acceleration happens very quickly 
it is likely that the direct effect of the acceleration will be very 
small compared with its indirect effect in producing a change 
of speed, and therefore the original experiment gives us a 
close approximation of what happens in the second experiment. 
If this is so then the observer who has stayed at home will 
find that he is older than one who has been on the journey. 
This prediction is a surprising one, but it does not conflict 
with our experience since we have no experience of such long 
journeys at the high speeds necessary to show the effect. 

At the same time it leads to surprising results if we suppose 
that we can include biological systems among those things 
affected by the consequences of this theory. For example, if 
we imagine two caterpillars, one of them at rest with the 
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observer, and the other moving off and returning, it is imagin¬ 
able (according to the theory) that the stay-at-home caterpillar 
would be a butterfly when the returning wanderer was still 
only a caterpillar. The reason that this is called a paradox is 
because some people wrongly imagine that Einstein made 
some statement to the effect that all motion was relative and 
that therefore the two observers O and O' were equally 
entitled to their descriptions of the events. If this were so, 
then each caterpillar would have both to be a butterfly and to 
consider that the other must still be a caterpillar. It is obvious 
that there is no such symmetry between the two observers. 
One of the observers has remained at rest, the other had, at 
the extreme end of his journey, to undergo a violent accelera¬ 
tion. It is true that we neglect the direct effect of this accelera¬ 
tion on his clock, but its indirect effect is clear. Since only 
one of the observers has accelerated, it follows that their 
descriptions of the events are not equivalent and it is therefore 
not in the least paradoxical that one finds the other to be older, 
although it is, of course, a surprising and interesting con¬ 
sequence of the theory. Whether the measured time described 
here agrees with the time experienced by ageing of living 
creatures is perhaps open to question, but there is very direct 
evidence for the change in time reckoning with motion in 
cosmic ray physics. At the earth’s surface the meson showers 
observed have a soft component consisting of /n-mesons. 
These are charged particles about 207 times as heavy as the 
electron. In the laboratory such particles are found to have a 
life time of about two-millionths of a second, at the end of 
which time they decay into an electron and two neutrinos. It 
is known that these mesons are produced mainly in the upper 
atmosphere, about 10 km up, by high-energy collisions, and 
they then travel towards the earth with speeds very near to 
that of light. On Newton’s ideas of time, even if the particles 
travel with the speed of light they can only travel about 600 
metres before they decay, and they would therefore never be 
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observed on the earth’s surface. However, the life-time 
quoted above is that for an observer at rest relative to the 
meson. If we consider an observer at rest on the earth’s 
surface, with respect to whom the mesons are moving very 
rapidly, the meson’s “clock” will be going slow by a factor of 
at least 10, and the mesons can therefore travel at least 6 km. 
The faster ones will have an even longer life-time and so can 
easily reach sea-level. 


Eiastein’s assumption 

Let us go back to the earlier rule for assigning times and 
coordinates to distant events. Einstein answered his earlier 
problem about the electromagnetic wave by assuming that the 
speed of light is a universal constant for all observers or at least 
for all observers at rest in inertial frames. Such a hypothesis 
was far beyond the experimental data available in 1905, but 
the truth of the matter is that Einstein was here defining a 
convention for measuring time at distant points. What one 
requires of a convention is not so much agreement with experi¬ 
ment, which is more or less automatic, but a general consis¬ 
tency. The question of whether he achieved this consistency 
is still sometimes regarded as open by a few people, but by far 
the greater majority believe that he did. 

Einstein assumed then, without too much question, that in 
order to carry out this process (which has been called by 
Bridgman “spreading time through space”) it was convenient 
and allowable to suppose that the velocity of light at all times 
and places was a universal constant, and to define the time of 
distant events by using this fact. 

Now it does seem, when setting up a theory of this kind, that 
this assumption is quite unexceptionable, for the light obviously 
plays a unique role in being the fastest signal which we know 
how to send. Moreover, it is well described by a particular 
theoretical set of equations, that is Maxwell’s equations, which 
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predict indeed, if properly understood, that it will have this 
universal character for its velocity. However, these arguments 
are not so strong as they need to be, in view of the rather 
paradoxical nature of the results of the theory and the extreme 
opposition which has been raised against it in various quarters 
ever since 1905 when it was first produced: indeed this opposi¬ 
tion seems in certain quarters to be increasing rather than 
diminishing, although it is true to say that most physicists 
accept the theory. Instead, therefore, of supposing that the 
velocity of light is a constant and so serves us as a uniquely 
convenient way of spreading time through space, we shall take 
a more general point of view and follow the line of Whitrow 
which appears in his book published in 1961.t In Whitrow’s 
treatment, instead of assuming anything about the velocity 
of light beforehand, we simply make a number of assumptions 
about the characteristics of signalling by means of light 
and establishing criteria for the time of distant events that way. 

Assigning times and distances 

Consider the events consisting of an observer sending out 
a signal and having it reflected back from a distant event 
(Fig. 4.5); let the event which consists of his sending the signal 
be E„ say, let it be reflected back at the event E 2 , and received 
by him again at the event E 3 . We must first make an assumption 
corresponding to the fact that the signals sent by light are the 
fastest that we know of, so we begin by assuming: 

Assumption 1. t 3 > unless, of course, E 2 is actually in 
the same place as £, and £ 3 , in which case t 3 = 

Further we suppose that the assignment of a time t 2 to the event 
is by a rule which determines t 2 uniquely, from r, and t 3 . This 
is expressed by 

Assumption 2. t 2 = f(t\,t 3 ). 

t G. J. Whitrow. The Natural Philosophy of Time. 









52 EXPLORING UNIVERSITY MATHEMATICS 2 

Here the mathematical notation simply implies the existence of 

a rule by which, when /, and / 3 are given, u is specified. 

The problem, which we may call Einstein’s problem, is to 
determine exactly what this rule is, or rather to formulate 



assumptions which limit this rule rather considerably. We have 
then to determine the rule which the observer may use to 
assign a time to the event £ 2 when he knows /, and f 8 . (We 
shall call the signals which are being employed here light 
signals, not because we are here assuming that they must be 
light, but because it is convenient to have a name for them, and 
in practice light or electromagnetic radiation is that usually 
employed for determining distant events in this way, for 
instance by radar.) 

We now make two rather obvious assumptions about the 
behaviour of light: 

Assumption 3. There is only one light joining £, and £ 2 , and 
only one joining £ 2 and £ 3 . 
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Assumption 4. (a) If 3 events £,, £ 2 , £ 3 occur in that order, 
along a light signal transmitted from £,, then / 3 > / 2 . 
( b ) If £|, £ 2 , £ 3 occur in that order along a light signal 
received at £ 3 , then t 2 > 

It turns out, however, that we cannot solve Einstein’s 
problem completely unless we also consider how an observer 
assigns distances, as well as times to distant events. This 
leads to: 

Assumption 5. The distance assigned between two events 
on a particular light signal depends only on the times 
assigned to the events. 

Assumption 6. Distances so assigned, on a straight line, add 
up in the usual way. 

Assumption 7. The distance described by a light signal 
emitted from an observer/! and reflected at a distant event 
is the same as that described by the signal on its return 
journey to A. 

In other words the observer A sets up these distances and 
times of distant events on the assumption that he is at rest. 

If he regarded himself as moving instead of at rest, then he 
would reject this axiom because he would say that when the 
light beam returned to him, it had to move either a greater ora 
lesserdistance, because he had moved from his original position. 


The assigning function 

We must now introduce a little mathematical symbolism. In 
the first place we already have a notation for the rule which 
assigns a time t 2 to a distant event which is observed by a light 
signal sent from the observer at time t x and reflected and re¬ 
ceived back again at time / 3 . This relation was written in the 
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form of t 2 = /(/,, t 3 ), this notation simply indicating that, when 
1 1 and are both given, t 2 is assigned. In the same way if f, and 
t 2 are the times which are theoretically assigned by A to two 
events on a light signal, we shall write for the distance apart 
which is assigned to them, the expression #(/,, t 2 ) where g now 
indicates simply that this distance is defined if, and only if, we 
know both the times /, and t 2 . From our assumption about the 
addition law for distances we now have a rule of the form 
8 ViA 2 ) + 8 (h,h) - gVi,t 3 ), or what would be more sugges¬ 
tively written as git u t 2 ) = g(tt, t 3 ) —g{t 2 ,t 3 ). In this form we 
see that, for any particular value of r 3 which does not occur on 
the left-hand side, the expression for g(t,,t 2 ) is given as the 
difference of two expressions, one of which does not involve t 2 
and one which does not involve t,, so that g(t u t 2 ) is equal to 
some new expression, say, h(t x ) —h(t 2 ). 

Now let us use with this result the fact that the distance 
assigned to a distant event is the same in whichever direction 
the light is travelling, and suppose that the event E 2 is at a 
distance 7? away; then we have the expression 27? = h{t 3 )—h(t x ), 
so that 7? is a half of h{t 3 ) — h(t x ). In the same way we can see 
at once that /t(/ 2 ) is a half of h(t 3 ) + h(t t ), and this is actually 
an expression for the time assigned to £ 2 . for it comes to 




h~ l being the inverse function of h, the argument of this func¬ 
tion being the average of h(t 3 ) and /i (/,). The meaning of the 
inverse function is exactly the same as in various other parts of 
mathematics in which it occurs, for instance in trigonometry, 
that is the inverse function of h is the quantity which is such 
that h of it gives us the quantity under consideration. We know 
that such an inverse function will exist for the case of h 
because it is clear that h(t) is something which increases when 
t increases. This is because h occurs in measuring the distance 
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of events along a light path, and we shall suppose that if 
certain events are at increasing times along a light path, then 
they are also at increasing distances. 

Clearly everything about the assigning of both times and 
distances to a distant event depends upon this quantity h(t) 
which we have introduced, and we have only to determine 
what sort of functions can enter as h(t). The first thing that we 
notice from the expressions for t and 7? is that if we take a new 
function, say 77(f) of the form ah(t) + b . where a and b are 
constants, then a and b will fall out of the equations; that is to 
say, they will not really be involved at all, and so the same 
times and distances will be defined by 77(f) as by h{t). The 
converse of this result is also true, that is to say, if we have two 
functions 77(f) and /i(f) which assign the same times and dis¬ 
tances to distant events, then h and 77 will be related in this 
linear fashion. We can see this as follows. We have that 


/r . ( /»(/, ) + 7t(r,) l = H .j /7(f,) + /7(f 2 ) j 

and so it we take a new function F(f) = and take 

for our f, and t 2 , 77 _, (x) and H~'( y), then we shall find that 
this new function F satisfies the equation, for any x and y, that 
F, of the average of x and y, is the average of F(jr) and F(y ). 
So long as F is a continuous function, that is to say, it has no 
breaks in its graph, which we should certainly assume for 
physical reasons in a case like this, it follows from this that 
F(x ) is a linear function of jc, as we can see by drawing a graph 
of F between two points. This means that is a linear 

function of / and so we can see at once that It is a linear function 
of H. 


t 
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Whitl ow’s deduction of Einstein’s postulate 

It is now appropriate to incorporate two further assump¬ 
tions, which are certainly always made about time-keeping, 
although they are perhaps slightly less difficult to reject than 
the earlier ones. 

Assumption 8. The time interval between two events 
assigned by an observer does not depend on how he sets 
the zero of his clock. 

Assumption 9. If the unit of time-reckoning is changed for 
the measured times, it is changed in the same way for the 
assigned times. 

Assumption 8 means that if f, and t 3 are both increased by an 
amount k, then t 2 must also be increased by an amount k, and so 
whenever 

+ *(*)}. 

then 

h(t 2 + k) = i{h(t t + k) + h(t 3 + k)}. 

This is clearly an instance of the work which we have just done 
with the H and h\ we simply choose for our H(t),h(t + k) and 
we therefore have from what we have just proved that h(t + k ) 
is some linear function of H(t) which we can write as 

h(t + k) = p(k)h(t) + q(k), 

where p(k),q(k) take the place of the constants in the previous 
discussion. They will now, of course, depend upon the parti¬ 
cular value of k which we have chosen. In particular, by 
choosing / = 0, we have 

h(k) = p(k)h{0) + q(k), 

and since we can without any loss of generality take /i(0) = 0, 
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for this simply corresponds to the change of zero of the time 
reckoning, it will then be clear that q(k) = h(k). We therefore 
have the equation 


h(t+k) =p(k)h(t) + h(k). 

But, in this equation, / and k are simply any two numbers and 
we could therefore interchange them, that is to say write k for 
/ and / for k, so that we also have 

h(t + k) = h(k)p(t) + h(t). 

By comparing these two results it follows at once that 

pW-i . pW-i 

h(t) h(k) ’ 

and since one side does depend on and one does not depend on 
k. both are constants, so that the function p(t) is \ + ah(t), 
where a is a constant. Two possibilities arise here: the constant 
a may be 0, or it may be non-zero. If the constant a is 0 then 
the function h has to satisfy 

h(t + k) =h(k) + h{t), 

and this equation, which is actually a well-known one, has only 
one solution so long as we are prepared to allow h only to be 
continuous and well-behaved in the manner to which we are 
accustomed for physical functions (that is to say, the functions 
which occur in physics). This fact is obvious if we first consider 
h(2t), which will be 2h(t), and then h(3t), which will similarly 
be 3/i(r), and so on. Proceeding in this way it is easy to show 
that, at least for any number x which can be expressed as the 
ratio of two integers, h(xt) will be xh(t). This will imply that 
h(t) is actually proportional to t. The case in which a is non- 
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zero is a little more complicated, but in tact we will not need 
to consider this, for it is not only the independence of the zero 
of clock-setting which we wish to incorporate into our devices. 
We also want to incorporate Assumption 9 so that if, for in¬ 
stance, the numbers assigned to /, and t 2 are doubled then also 
the number assigned to the time l 2 by the observer is doubled. 
This will be found to require that the constant a actually is 
zero. For if a is not 0. then we can easily deduce something 
about the way the function p must depend upon t and k. If a 
is not 0. we can see by comparing /?[/+ (k + /)], and /i(2/ + k) 
that 

p(t + k) = p(t) p(k ), 

from which it follows that 1 +ah(t + /:) is given by 

14* ah(t+k) = [1 + ah(t)] [1 +ah (A - )]* 

Thus h must satisfy this “functional equation”. On the other 
hand, if the change of the time unit is not to make any difference, 
then in particular it will not make any difference if we halve the 
unit of time and so replace h(t) by h(2t) everywhere. Now 
h(2t) from the original definition of /?(/ + /:) is [1 + p(f)]A(/) f 
and if we use the fact that h(2t 2 ) must be the average of h(2t l ) 
and h(2t A ) we see at once that p(t 2 ), p(f,) and p(f 3 ) must all 
be equal. But this equality is for arbitrary values of r, and f 3 
and therefore p(t) must be a constant. In other words, the 
equation which we have down to be satisfied by p(t + k) as 
a product of p(t) and p(k) has to be satisfied by this constant 
value of p and therefore p must be either 1 or 0. Now p cannot 
be 0 because in that case h(t+k) would be the same as h(t) y 
and this would lead up to no sensible determination of time at 
all, but simply to a trivial situation. Therefore p must be 1 and 
a = 0, so that we are back in the previous case where h(t) 
was proportional to /. From the way in which h was introduced, 
then, the time assigned to a distant event which is illuminated 
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by a signal leaving the observer at time t x and returning at time 
h is + f 3 ). This was the assumption made by Einstein and 
sometimes called Einstein’s postulate, but we see here how 
this postulate is really unavoidable if we make the rather 
obvious assumptions we have made about time-ordering and 
independence of zeros and units. 

We have really no choice but to assign the time to distant 
events in this way. It is not an experimental fact but something 
which is already determined by the way in which we speak of 
times and distances. It also follows that the distance of the 
event, /?, is £(r 3 — /,), apart from a constant multiplier, so that 
the observer necessarily has to assign to events a distance 
proportional to the difference of time that it takes a signal to 
go and return. This is equivalent to his assuming a constant 
velocity of light, which as we said before was exactly what 
Einstein did. We see that the constancy of the velocity of 
light is not so much an experimental assumption as a necessary 
consequence of the way that we think about times and dis¬ 
tances. Of course, people did not always analyse the way in 
which they thought about times and distances in this way, 
because the idea that the velocity of light was a constant 
was a difficult one to reconcile with their normal ideas of 
velocity which they had derived by thinking about billiard 
balls or horses, or whatever mechanical devices they may 
have used to get their ideas of velocity from. The velocity 
of light must be a speed in rather a different sense from the 
speed of a cricket ball, for otherwise it would not have this 
peculiarly constant nature. None the less we are forced to 
assign to it this constant nature, even if it does mean a revision 
in our ideas of velocity, if we wish to retain our ideas of how 
one should assign times and distances to distant events. 
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CHAPTER 5 


WALLPAPER PATTERNS 

H. Kestelman 


1. Introduction 

There are two distinct elements in a wallpaper: the design, 
or motif, which the designer creates, and the pattern, which is 
the way the motif repeats. The same applies to cotton prints 
and to curtains and to anything fiat that carries what we usually 
call a repeat pattern. To define what is meant by a repeat 
pattern, and to make it amenable to mathematics, we shall 
presently introduce formal definitions and symbols, but 
roughly we may say that a wallpaper is a picture (or set of 
points in a plane) which is unaltered by certain transforma¬ 
tions: more specifically, translations, rotations and mirror 
images. 

There is no limit to the number of motifs, but there are 
exactly seventeen patterns (see Diagram 1), and it is no more 
possible to find an eighteenth than it is to draw a rhombus 
whose diagonals are not perpendicular. Seventeen may seem 
small, but when we have examined the implications of pattern 
it will seem large. Although all seventeen were used by the 
Moors 600 years ago, it was not until 1891 that the limitation 
to seventeen was proved by the Russian mathematician 
Fedorov; it came as an afterthought to his discovery that in 
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three dimensions there are just 230 patterns, a fact well known 
to crystallographers. The branch of mathematics most relevant 
to the discussion of pattern is the theory of groups, but for 
plane patterns we can manage well enough with elementary 
geometry and some novel types of argument which need not 
be called group theory. 

Before embarking on formal definitions of symmetry let 
us examine two familiar pictures: the graph of sinjc (Fig. 5.1) 



and the honeycomb (Fig. 5.2). There are certain things we can 
do to these figures that do not alter their appearance but which 
would certainly alter the appearance of other pictures. 

If we shift the graph of sin* bodily in the direction of the 
, .v-axis an amount h r, or if we rotate it 180° about the origin, 
or if we take the mirror image of the graph in the line x = &r, 
the appearance of the picture is unaltered. There is still 
another type of transformation, called glide-reflection, that 
leaves the graph unaltered: if we translate the graph an amount 
7r in the direction of the .r-axis, the picture changes, but if we 
follow this by a mirror image in the ;t-axis the picture is 
restored to its original appearance. 

The honeycomb (if we imagine it continued indefinitely) 
has numerous symmetries. If we rotate it through 60° about a 
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Fig. 5.2. 


centre of one of the hexagons, the picture is unaltered (con¬ 
sequently a rotation of 120° about the same point leaves 
the picture invariant); a rotation of 120° about a hexagon 
vertex leaves the picture unaltered, and so does a rotation of 
180° about the mid-point of any hexagon side. If we take 
a mirror image in a line through a hexagon centre which passes 
through a vertex, or bisects a side, of that hexagon, the picture 
is unaltered; it is invariant under any translation that carries 
one hexagon centre to another: midway between consecutive 
parallel mirror lines will be found axes of glide-reflection 
symmetry. It is no accident that this picture has no 90° 
rotational symmetry: we shall see that 120° and 90° rotational 
symmetry cannot coexist, nor can there be rotational symmetry 
involving angles other than 60°, 90°, 120° and 180°. The reason 
for these limitations is to be found in the existence of trans- 
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lational symmetry which we shall assume always to be present 
in wallpaper. 

If we restrict ourselves to patterns without rotational 
symmetry, or having at most centres of 180° rotational 
symmetry, we can easily construct repeat patterns in the 
following way. Take any two non-parallel vectors a and b and 
a parallelogram with sides equal to a and b, and let O be the 
centre of the parallelogram. Now draw any motif in the 
parallelogram and call it a. The points P where 

6P = wa+nb + i(a-l-b) (m, n integral) 

divide the plane into equal parallelograms in each of which we 
place a translated copy of a (Fig. 5.3). If we denote by T,(<r) 



the set obtained by translating every point of a an amount t 
(see §2), then S, the union of all the sets 


T,(a), where t = wa + «b (m.n integral). 
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is a specimen of what we shall recognize as a wallpaper. It is 

easy to show that the whole set 5 has the property 

T,(S)=S whenever t=ma + //b (m, n integral ); 

we say that S is invariant under all these translations. 

Had we taken the trouble to make w “symmetric about O" 
(e.g. by drawing a line through O to bisect the parallelogram, 
drawing any picture in one of the halves and then taking tr to 
be this picture and the result of rotating it 180° about O) 
then the resulting S would not only have the translational 
symmetry, it would also have 180° rotational symmetry about 
O and about every point P , where 

OP = i(ma + nb) (m, n integral) 

(Fig. 5.4); this is a little harder to prove, but it follows from 
Proposition 5 proved in § 5. 

‘ Fig. 5.4. 

The moment we try to introduce other angles of rotation 
into a wallpaper, the situation alters radically: the only possible 
angles are found to be 60°, 120° and 90°, and the vectors 
a and b can no longer be chosen at will. Other restrictions 
assert themselves if we try to introduce mirror or glide lines. 
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We have now reached a position where we have to set up 
precise terminology to prove these results. 



2. Isometric transformations of a plane 

The simplest transformations encountered at school are 
congruence transformations: these are transformations that 
map any point P on to a point P* so that the distance between 
any pair A, B is the same as that between A ' and B'. Such 
transformations are also called isometries ; it will be useful 
to have afunctional notation for isometries. 

(i) If t is any chosen vector in the plane, we define the 
translation 7, to be the transformation mapping P on to P' so 
that PP' = t; we write T t (P) for P' where it is convenient to 
do so (Fig. 5.5). 



Fig. 5.5. 


(ii) A transformation of “rotation about a point A through an 
angle 6 ", denoted by /?$, assigns to each P the point P\ where 

p*= rJ(P) 
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AP' is AP rotated 0 about A: again we write Rq(P) for P' 
(Fig. 5.6). 

There are two other isometries ft: 

(iii) ft(P) is the mirror image of P in a chosen line / of the 
plane: ft is called a mirror (Fig. 5.7). 


p 


p' 


i 

Fig. 5.7. 


(iv) We choose a line / in the plane and a vector t parallel 
to /; the transformation which translates P an amount t and 
then reflects it in / is called a glide-reflection (Fig. 5.8). 



i 

Fig. 5.8. 
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It can be proved that every isometry of the plane is one of the 
four mentioned. 

The different kinds of isometry can be characterized by 
the way they transform line segments; suppose thje line 
segment AB is transformed by an isometry into A'B' (in 
general, if S is any set we denoted by ft(S) the set of the points 
fl(P) with P in S). Then 


(i) if ft is a translation, AB = A'B', i.e. ft leaves direction 
as well as length unaltered (Fig. 5.9); 



(ii) if ft is the rotation R then A B' is AB turned through 
6 about 0\there is only one point of the plane invariant 
under ft and that is O (Fig. 5.10); 



Fig. 5.10. 
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(iii) if fl is a mirror in the line /, then AB and A^B' are 
equally, but oppositely, inclined to /; here all points of 
/ (but no others) are invariant under fl (Fig. 5.11): 



(iv) if fl is a glide with axis /, AB and A'B' are equally but 
oppositely inclined to /; here no point of the plane is 
invariant under fl; / is the only line which is invariant 
under fl (Fig. 5.12). 



Fig. 5.12. 
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If fl is an isometry, we write fl -1 for its inverse, i.e. the 
transformation which maps fl(P) back on to P. Plainly 

{/?£}-'= /?!, and (7,)-> = r_,. 


2.1. Composition of isometries 

If fl, and fl, are isometries, we define the product fl,fl, as 
the transformation mapping P on to fl,{fi ! (/ > )}, i.e. we apply 
fl 2 to P and then fl, to the result (note the reversal of the 
natural verbal order). The only products we need to examine 
are the following: 

(0 T„7b= T, +b (this is obvious). 

(ii) Equal and opposite rotations about distinct points 
produce a translation: specifically 

RtRi,= T„ 

where t is the vector from B to /?#(£) (Fig. 5.13). 



Fig. 5.13. 
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That the resultant is a translation follows from the fact that 
R-e rotates any line segment through —0 and R ,J rotates it back 
the same amount, so that the segment is finally unaltered in 
direction. Knowing that the resultant is a translation, we can 
find t by applying the compound rotation to any one point and 
it is convenient to choose B since R B leaves B invariant. 

(iii) If a + fi is not an integral multiple of 360°, then R^R" 
is a rotation a+p about some point of the plane. 

This is because any line segment of the plane is rotated p by 
R% and then a by R£, so that all segments are rotated through 
a+P\ hence the resulting isometry is a rotation about some 
point X. The position of X is easily found, bearing in mind 
that it is the one point left invariant by Let /, be the line 



AB, 4 the result of rotating /, through — ip about B, and 4 
the result of rotating /, through ia about A (Fig. 5.14). Since 4 
makes an angle h(a + p) with 4, it cuts 4 in some point X. 
Plainly R% maps X on to X', its mirror image in /,, and R * maps 
A" on to its mirror image in /,, i.e. on to X. which is therefore 
invariant under R„Rg- 
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2.2. Symmetry group 

If S is a plane set of points, there may be isometries ft that 
leave 5 invariant, i.e. ft(S) = 5; these isometries constitute 
what is called the symmetry group of S, denoted by S^(S). 
If &(S) includes translations T,, we denote by &~(S) the set of 
all such t. 

If R'g(S) = S we say that A is a rotation centre of 5, and if 
there is a smallest such positive 0 it is easy to show that 0 = 
360 °/k with k an integer, and A is then called a k-centre of 5. 
The honeycomb (Fig. 5.2) has two-centres, three-centres and 
six-centres; a circle with centre A is invariant under all rota¬ 
tions about A. It will be convenient to call aA-centre even when 
k is even and treble when k is divisible by 3. It follows that A 
is an even centre if and only if /?-, , Wf »(5) = 5, and A is a treble 
centre of 5 if and only if R} 20 .(S) = S. 

It is clear that if ft, and ft., belong to #(S) then so does 
ftiftjl in particular, if a and b belong to &~(S) so does ma+ /ib 
for all integers m and n. It is also clear that if ft c S£(S) then 
ft-' e &(S) since ft-'(S) = ft-'{ft(5)} = 5. 

We are now in a position to define a wallpaper as a plane set 
S with the following two properties: 

(i) J~(S) includes vectors in non-parallel directions; 

(ii) there is a shortest vector in ^(S) (there may be several 
vectors in iF(S) with this shortest length: in Fig. 5.2 
the vector joining the centres of two hexagons with a 
common side has six possible directions). 

Condition (i) disqualifies the graph of sinx from being a wall¬ 
paper, since for this set S the vectors in &~(S) are all parallel 
to the x-axis. Condition (ii) is perhaps hard to appreciate. A 
set 5 (not a wallpaper) can be invariant under arbitrarily small 
translations, e.g. any straight line is invariant under any transla¬ 
tion parallel to the line. A little thought will show that if we 
allowed a set S with arbitrarily small non-parallel translations 
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to be called a wallpaper, it would be one in which paper was 
hardly visible at all, since every patch of paper, however 
small, would include points of S. 

The symmetry group of a wallpaper 5 sets up a relation 
between the points of its plane: we say that A and B are equi¬ 
valent (in relation to &(S)) if B = Sl(A) for some SI in $?($). 
A set of points F in the plane with the property that every 
point of the plane is equivalent to exactly one point of F is 
called a fundamental region of $?(S). For instance, in Fig. 5.4 
any line through O divides the basic parallelogram into two 
trapezia and either of these is a fundamental region for this 
wallpaper. 

In some intuitive sense it seems clear that the appearance 
of a wallpaper S as seen from a point A is the same as that seen 
from any other point equivalent to A; consequently 

(iii) if A is the centre of a rotation 0 leaving 5 invariant, the 
same is true of every point equivalent to A , and 

(iv) if t e y(S), then “t rotated 0” also belongs to y(S). 

Similarly, if / is a mirror or glide line of symmetry of S and 
t e y(S), then 

(v) the mirror image of t in / also belongs toy(5). 

Formal proofs of these fundamental facts are given in Appen¬ 
dix 1. It follows from (iii) and the formula (ii) of § 2.1 that the 
rotation centres of S cannot be arbitrarily close, for if they 
were then y(S) would include arbitrarily small vectors, 
contrary to the definition of a wallpaper. 


3. The restriction of the rotations in a wallpaper 

We now prove the fundamental proposition (known as the 
plane crystallographic restriction). 
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Proposition 1 . If a wallpaper S is invariant under a rotation 
9 and 0 < 0 < 180°, then 9 must be 60° or 120° or 90°. 


Proof Let OA be a shortest in y (S) and let OA' — —OA 
(Fig. 5.15). Take B so that OB is OA rotated 0 about O. Since 



OB e byj 2.2 (iv), it follows that AB (=OB — OA) and 
A'_B (= OB — Ok' ) are both in y (S ) and are not shorter than 
OA; this implies that 60° 0 *£ 120°. Suppose 60° < 9 < 120°, 

i.e. 0 = 90° + a with |a| < 30°. y (S) includes OB advanced 9, 
i.^ Ok advanced 180° + 2a, and also the negative of this, say 
OC, which is OA advanced 2a; since |2a| < 60°, AC < OA, 
and since AC E y(S) it follows that C = A, i.e. that a = 0, 
which means that 9 = 90°. 


Proposition 2. Suppose S is a wallpaper of which A is a 
rotation centre. 


(i) IJ A is a six-centre, then the rotation centre of S nearest 
to A is a two-centre; A is the centre of a regular hexagon 
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whose vertices are all three-centres and whose sides are all 
bisected by two-centres (Fig. 5.16). 



(ii) If A is a four-centre, then the rotation centre of S nearest 
to A is a two-centre; A is the centre of a square whose vertices 
are all four-centres and whose sides are all bisected by two- 
centres (Fig. 5.17). 5 has no treble centres. 



Fig. 5.17. 


Proof. Let £ be a rotation centre of 5 nearest to A. 


(i) From the construction of Fig. 5.14. R^ would combine 
with a rotation about B of 90° or 120° to give a rotation centre 
closer to A than B. Hence B must be a two-centre. Now 
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Rm-R'i'tw = R- 120 °. where /.BAD = 30° and AABD — 90°. Thus 
D is a treble centre of S, but it cannot be a six-centre, for if it 
were we could combine rotations of 60° about A and D to get a 
rotation centre at a point £ on AB with Z^4DE — 30°. The 
assertion (iii) of § 2.2 then completes the proof. 

(ii) S cannot have treble centres, since a rotation of 120° 
would combine with the rotation of —90° to give a 30° rotation 
in $?(S) contrary to Prop. 1. B must then be a two-centre or 
else a four-centre, but the latter would combine with the four- 
centre at A to give a rotation centre closer to A than B\ thus B 
is a two-centre. The point M such that /.MAB = 45° and 
LMBA = 90° is then a four-centre, being the centre of R^rRmr- 
The proof is then completed by (iii) of § 2.2 using rotations 
about A. 


4. Lattices 

We have now to examine how the rotations in 5f(S) affect 
the translations in .T (S). The basic facts we use are 

(i) if a and bare in .T ( S) then so is ma + /ib (m, n integral). 

(ii) not all vectors in S’ ( S ) are parallel to each other. 

(iii) S' (5) has a shortest member, say t 0 . 

If u and v are given non-parallel vectors in a plane, the set of 
all vectors mu 4- nv with m and n integral is called a plane vec¬ 
tor lattice of which u. v form a basis: we denote it by /’[u.v], 
and it clearly has the property that if a and b are in it then so is 
ma + nb for all integers m and n, a property it shares with the 
translation set S' (5) of a wallpaper. In fact we have 

Proposition 3. If S is a wallpaper then :T (5) is a lattice. 

Proof. We choose any point O and form the set £, of all 
points P with OP in S' (5); the vector joining any two points of 
£, must then belong to S~(S). By (iii). the point P„ with 
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OP„ = t„ will be a point of L, closest to O. By (ii) there are 
points of L, not on OP„ and among these there must be a P, 
which is closest to O (this depends on the fact that any circle 
centred at O includes only afinitenumber of points of L, on 
account of (iii)). Clearly ST[dP 0 . OP,] is part ofF (5), and we 
proceed now to show that it is the whole. Let L, be the set of 
all points X with 

OX = mOP„ + nOP, (m.n integral). 

divides the plane jnto parallelograms each having sides 
equal to OP„ and to OP,. Let Z be any point of L, and suppose 
it lies in the parallelogram A,A 2 A : ^4 4 with (Fig. 5.18) 


A,A 2 = OP u and A^A 4 = OP,. 



Fig. 5.18. 


Since the vectors joining Z to the points A r belong to T (S) 
it follows that if Z is not a vertex then it must lie in the in¬ 
terior of the parallelogram; from the definition of P, it follows 
that the distances of Z from the A,, are all at least equal to A ,A 4 . 
It is left as an exercise for the reader to prove geometrically 
that if a point Z is in the interior of a parallelogram then its 
distance from at least one vertex is less than the longest side 
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of the parallelogram. It follows now that Z e L-, and completes 
the proof that 

^(S)=Jf[OP a , OP ,]. 

4.1. Every vector lattice V has infinitely many bases, e.g. 

Jf’[a.b\ = i?[a+b,b] 

since 

nm + nb = m(a + b) + (n — m)b; 

it can be proved (see Appendix 2) that ifi?[a,b] = [c.dj then 
the parallelogram with sides a and b has the same area as that 
with sides c and d. Among these many bases there is one of 
particular importance, namely that in which a is a shortest in 
V and b is a shortest among those in V which are not parallel to 
a. The lengths of a and b are uniquely defined in this way. and 
from what was asserted above it follows that the area and 
therefore the angles of the parallelogram are unique (though 
its orientation need not be so). We shall call this basis the re¬ 
duced basis of V and the corresponding parallelogram the 
reduced basic parallelogram of V. 

A lattice ^f'[a.b] is called diamond if a = b (Fig. 5.19) and 



rectangular if a is perpendicular to b. A diamond lattice in 
which the angle between a and b is 60° or 120° is said to be 
hexagonal, and a diamond rectangular lattice is said to be 
square. 
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If a and b are inclined at angle 0 . the length of ma+ nb is 

{m*a 2 + /i*6 2 } U2 if 0 = 90° 
and 

fl(m 2 + fl 2 +2mn cos 0) m if a = b. 

It is an easy deduction that if 0 is 60° or 120°or 90°, then a,b is a 
reduced basis of if'fa.bl. 

Another useful deduction relying on (iv) of § 2.2, is 
Proposition 4 

(i) If a wallpaper S has a treble centre, then (5) is hexa¬ 
gonal; if a is a shortest in if IS), then a and “a advanced 60°” 
form a reduced basis for (S). 

(ii) If a wallpaper S has a four-centre, then ST (5) is square; 

if a is a shortest in then a and ' ‘a advanced 90°” form a 

reduced basis for ZT{S). 


5. The positions of the rotation centres of a wallpaper 

If V is a vector lattice and O any chosen point, it will be 
convenient to say that the set of all the points P with Op in 
V is a point lattice supporting V. By (iii) of § 2.2 it follows that 
if A is a A-centre of a wallpaper S then the point lattice which 
supports S’ (S) and includes A consists of A-centres of 5. The 
set of all rotation centres of S is therefore a union of point 
lattices supporting . T(S ), but it need not be a single point 
lattice. The key theorem is the following. 

Proposition 5. IfS is a wallpaper then 

(i) its even centres, if any, are a point lattice supporting 

fcHS); 

(ii) its treble centres, if any, are a hexagonal point lattice 
supporting vis (5 ) advanced 30°. 

the meaning ofkff~ (S) being defined below. 
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Proof. We say v E k3~(S) if (1/A)v E.T(S). 

(i) Suppose O is an even centre of S. If OA E \ST (S) we have 
to show that A is an even centre of S. 

By hypothesis 


7 2<M^180° 

belongs to ( S (5); by (ii) of §2.1 it is a half turn and it clearly 
leaves A invariant, proving that A is an even centre. 

Conversely, if B is an even centre of 5 we have to show 
that 2 OB E ,9~ (5) and this follows from (ii) of § 2.1 since 

R?wR , !mr= P-idti- 

(ii) Suppose 5 has treble centres and that 0,A are a pair of 
treble centres whose distance apart is as small as possible 
(as explained in § 2.2, this makes sense). If we take B so that 
A OAB is equilateral then by (iii) of § 2.1 B is the centre of 
R° Vi0 Jti 2iy > and is therefore a treble centre of S. Similarly the 
reflection of O in AB is also a treble centre, and by continuing 
this pro ces s we see that the hexagonal point lattice supporting 
3f[OA,OB ] and including O consists of treble centres of S. 
This point lattice divides the plane into equilateral triangles, 
and if there were a treble centre not belonging to this lattice it 
would lie in one of the triangles, so making its distance from a 
vertex less than OA, and contradicting the definition of O 
and A. 

By (ii) of §2.1, 


^ + 1211 “^- 120 “— T(tA‘t 

where BA' — 2 Bh\ hence OA' E (S). If .T (5) included a 
vector t shorter than OA, then the corresponding translation 
would combine with /?? 2 o° to give a treble centre at a point 
whose distance from O is less than OA. Since this is impossible, 
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it follows that 04' is a shortest in T (5) and by § 4.2 & (S) is 
hexagonal. Since OA is ^ 04' advanced 30° (Fig. 5.20). this 
completes the proof. 


6. Classification of wallpapers by rotations 

We show now that there are just five types of wallpaper 
when they are classified according to their rotations, and for 
each type we indicate a fundamental region. 

(i) If S?(S) consists entirely of translations, then ^~(S) is 
unrestricted (as in §1); any basic parallelogram for^(S) is a 
fundamental region. 

(ii) If the only rotations in ^(5) are half turns, then F(S) is 
unrestricted (as in §1). If O is a two-centre, then the set of 
two-centres is a point lattice supporting iT(S) and including 
O. As explained on p. 64, any “half” of a basic parallelogram 
for &~(S) is a fundamental region. 

(iii) Suppose 5 has a four-centre at A. We know from Prop. 2 

that all rotation centres of S are non-treble and that B. the 
rotation centre nearest to A, is a two-centre. By Prop. 5 the 
rotation centres support the square lattice So, if we 
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know A jind B , and we take C and C' (Fig. 5.21) so that 
AC = 2AB, and AC is AC advanced 90°, we see from Prop. 
4(ii) that ACA'C is a basic square for (S). By Prop. 2(ii), 
the corners of this square, and its mid-point M, are four- 
centres, the mid-points of the sides are two-centres, and there 



Fig. 5.21. 

are no more rotation centres in or on the square. This means 
that the four-centres and the two-centres each support the 
lattice yk (S). If the square ABMB' is rotated through 
±90° and 180°, it covers ACA'C', and every point of the 
plane is equivalent (under a translation in &~(S)) to a point 
of ACA'C'. A closer examination shows that ABMB' is a 
fundamental region. 

(iv) Suppose S has a six-centre at A and the nearest rotation 
centre is B\ then the whole pattern is determined by these two 
points. To justify this statement, we argue again as in (iii). 
T ( S ) is hexagonal and 2AB is a shortest in it, so that &~(S) is 
determined by A and B. We know (Prop. 2) that the rotation 
centres of S are two-, three- and six-centres. Defining C, C', 
M as in (iii), we can easily locate all rotation centres in the 
basic parallelogram ACA'C'. The even centres support £&~(S) 
and include A, the treble centres support advanced 

30° [Prop. 5(ii)J and include A. Hence the six-centres, being 
treble and even, are at A, C, A', C'\ the two-centres bisect 
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the sides and diagonals of ACA'C ', and the three-centres are 
at the points D and D' which trisect the long diagonal (Fig. 
5.22). 



Every point of A ACC is equivalent to a point of A ADC (by 
a rotation of ±120° about D) and every point of A A'CC' is 
equivalent to one in A AC C (by a half turn about M). A closer 
examination shows that A/4 DC is a fundamental region. 

(v) If 5 has a three-centre but no six-centre, then all the 
rotation centres of S are three-centres (for if there were an even 
centre we would combine rotations of 180° and 120° to get a 
60° rotation). So, if A and B are a nearest pair of three-centres. 
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they determine the hexagonal lattice of rotation centres 
(Prop. 5) and the hexagonal lattice &~(S) is determined by 
AB advanced 30° (AC in Fig. 5.23). Taking C so that A ACC' is 
equilateral, the parallelogram ACA'C is basic, and the rota¬ 
tion centres in or on it are at the vertices and at the points 
B. B' trisecting the long diagonal. 

Every point of A ACC' is equivalent to one in A BCC' (under 
a rotation ±120° about B) and every point of A A'CC' is 
equivalent to one in A CB'C'. A closer examination shows that 
the diamond BCB'C' is a fundamental region. 


7. Conclusion 

Each of the five discussed in § 6 is considered to be a single 
pattern. Further variety can be obtained only by introducing 
mirrors and glides. Thus, in addition to the five patterns without 
“image elements”, each of these can give rise to others. It 
would take too long to explain in detail how the additional 
patterns are restricted: the principles are the same as those 
used in classifying according to rotation. There are in fact 
just twelve patterns incorporating mirrors or glides and these 
are all illustrated in Diagram 1. 


Appendix 1 

(i) If t £ F(S) and ft £ f S(S), then ft(t) £ J~(S): here 
ft(t) denotes t advanced 0 if Cl is a rotation through 0, and it 
denotes the mirror image of t in / if ft is a mirror or glide- 
reflection with axis /. 

(ii) If R$ £ S?(S) and ft £ $(S), and B = fl(A), then 

R% e »(5). 

Proof, (i) It is enough to show that ftT,ft _l is a translation 
7j. with t' = ft (t). If for any point P we define P', P" and P” by 
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P' = il~'(P), PT" = t, P"' = ft (/>"), 

it is intuitive, from Fig. 5.24 when ft is a rotation about O, and 
from Fig. 5.25 when ft is a mirror or glide-reflection with axis 
/,that PP'" = ft(t). 




(ii) Since ft/?#ft 1 is a rotation through 8 or —6 (as in § 2), it 
is enough to show that its centre is B, and this follows from the 
equations 


ft^ft-*(B) = ft«/?a (A ) = ft(/4) = B. 
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Appendix 2 

If J? [a.b] = oSf[«,/3] then the parallelogram with sides a and 
b has the same area as that with sides a and / i. 

Proof. Since a and fi belong to J?[a.b], 

(i) a = ma + nb and j3 = pa + qb 

for suitable integers m, n, p, q. Take rectangular axes Oxy with 
Ox parallel to a, and take points A , B, C, D so that 

OA = a, OZ?= b, OC = a, OD = p. 

We have to prove that A A OB and A OCD have equal area. 
If A = (a, 0) and B={b u bi), then C = (ma + nhi,nb 2 ), 
D = (pa + qb u qb 2 ) and 

2A COD = ±|(/na + nb x )qb 2 — (pa + qb^nb^ 

= ±ab 2 (mq — np ). 

It is therefore enough to prove that mq — np, which we denote 
byj,is±I. By (i). 


qa — nfi = ja and m/3 — pa — jb‘, 

since =Sf[a,/3] includes both a and b, it follows that mlj and nlj 
are integers, and so, by (i) llja G if[a,/3], which requires 1// 
to be an integer; hence j = ± 1. 
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CHAPTER 6 


THE MATHEMATICS 
OF GAMBLING 

D. M. Burley 


Introduction 

Two games that have fascinated mathematicians for many 
years are the games of poker and tossing pennies. Their in¬ 
terest in poker has been mainly as a process to make money 
and although this is interesting, in my experience their success 
has been severely limited. Their interest in the game of tossing 
pennies, however, has been much more serious and produc¬ 
tive. The game has found many important and successful 
applications in models in kinetic theory and has been studied 
over the past 70 years or so by such eminent scientists as 
Lord Rayleigh, Einstein, Boltzmann, Markoff and many 
others. Their particular interest was in problems of diffusion 
and heat-flow; problems which were at the forefront of scien¬ 
tific thought at the turn of the century and are now quoted 
with their solutions as standard examples. 

For the present we will be content to set up some of the 
simpler mathematics used in this type of work and then to 
indicate briefly the applications mentioned above. For further 
reading three books are listed, with comments, at the end of 
this chapter. 
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Statement of problem 

The problem can be simply stated as follows. A player A 
starts with M pennies and a player B starts with L pennies; 
they take a further coin and toss it, if it lands heads B gives A a 
penny and if it lands tails A gives B a penny. After N tosses 
what is the chance that one of the players is bankrupt? 

Let us concentrate on the probable profit or loss of the player 
A. At each throw of the coin we assume that the probability 
that A wins is I and that he loses is also (With little more 
difficulty we can deal with bad pennies when the probability 
of winning is p and of losing q with p + q= 1 .) 

After 1 st throw A may have won -I- 1 or —1, 

After 2nd throw A may have won + 2 or 0 or —2, 

After 3rd throw A may have won +3, +1, —1,-3, etc. 

We represent these possible winnings on a diagram shown in 
Fig. 6.1. Each game can be represented by a path on the 
diagram as indicated. 



-5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 Profit 


Fig. 6.1. Diagram showing A ’s possible profit or loss after the first 
few throws. Two special games are indicated. 
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We see that if /f’s profit becomes L at any stage then B is 
bankrupt and the game is over, and if /I’s loss becomes M 
then A is bankrupt and again the game is over. Hence we have 
to analyse the number of paths in Fig. 6.1 which cross the 
vertical lines on the diagram showing A’s profit as either +L 
or —M. 


Both players having infinite capital 

Although this game is rather futile, since neither player can 
win, we study it to set up the method that we shall use, and to 
get some idea of the problem involved and the answers we 
might expect. 

We try, firstly, to compute A's probable profit or loss after 
N throws. From the total number of games that can be played, 
which is 2\ we try to choose those games which end with a 
profit P. Suppose that A has won (N — r) throws and lost r 
throws then his profit is 


P= (N-r)-r 

g ivin 8 P = N — 2r or r = i(N-P). (j) 

(Note here that N and P are odd or even together so that r is 
always an integer.) Again looking at Fig. 6.1 we see that we 
have to take r steps to the left out of the total of N steps and 
we further notice that it does not matter which order these 
steps are taken. Thus the number of sequences which end in 
making a profit P after N throws is given exactly by the number 
of ways of choosing r objects from N objects, ordering being 
irrelevant, i.e. the binomial coefficient 'C,.. Hence using)I) we 
find 

the number of sequences ending with a profit P after N throws 

_ m _ 

[HN-P)]\[i(N + P)]V 


= x c r 
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Since the total number of sequences is 2' wc have 


the probability of ending with a profit P after N throws 

= W(P.N)=- S[ i( /v-P)]![4(/V + />)]f (2) 

This is a binomial distribution and is shown for a typical 
example in Fig. 6.2. 



Fig. 6.2. Graph of W[P. 10) against P. 


We note that 

2 »WV)=^2 v Cr= 1 (3) 

/>= -,V r=*o 

which states the obvious fact that A must have made a profit 
somewhere between +N and —N. To calculate the average 
profit that A makes after N throws we need to compute 
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P = ^ PW(P,N). 

/*= -V 


(4) 


It is less obvious how to perform this summation but we take 
a hint from the use of binomial coefficients in equation (3); 
introduce a variable x and consider the following sum: 

2 W{P,N)x>- = ± f v C>v- 2r [using(1)] 

P m -iV z r=0 

= (ix) v 2 K C r (x~ 2 ) r 

r=0 

= (ix)' v (1 

or 

2 (5) 

A>= -A' L 


The right-hand side of equation (5) is called the generating 
function for the W(P,N), since the coefficient of .v ,J in the 
expansion of the right-hand side gives W(P.N). By putting 
x = 1 into equation (5) we see that we reproduce equation (3). 
To calculate P from equation (4) we differentiate equation (5) 
with respect to x: 

£( 2 fy(P,JV)x") = 2 PW(P,N)x'-~' 

'P —.V > /*= _.v ' 

and if we again substitute x = I we deduce that 


P = 0 
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or on average we will have made zero profit, a result which we 
could have anticipated from the symmetry of Fig. 6.2. A 
similar argument, differentiating once more, shows that 

F-.n 

giving the root mean square deviation as N' n , this measures 
the average spread of the profit or the width of the curve in 
Fig. 6.2. 


Both players with finite capital 

We now use our generating function method to attack the 
full problem, but this time we use the method from the begin¬ 
ning of the calculation. The analysis of this problem (i.e. the 
chances of A or B being bankrupt after N throws) can be 
followed through but it leads to more complicated mathematics 
than we wish here, so we content ourselves in answering the 
two questions (i) what are the chances of A winning at some 
time during the game?, (ii) how long on average will the 
game last? 

In order to answer the first question we introduce 

A k = the probability that A wins if A starts with A pennies, 

and we see that our eventual aim is to calculate A M . Further 
we note that A„ = 0 since in this case A is already bankrupt 
and A /.+, w = 1 since A has already won. Suppose that at some 
stage of the game A has k pennies then his chance of winning 
at this stage is A k . The coin is then tossed; if A wins he ends 
up with (A+l) pennies and his chance of winning is now 
A k+t : if A loses he ends up with (A — 1) pennies and his chance 
of winning is A k - V Since after this toss his chance of winning 
the whole game is still the same we can write 


A k = Mk-i + tAk+i (A—1,2. L + M — 1) 
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of course remembering that A 0 = 0 and A = 1. (Here again 
we can deal with the tossing of a bad coin by replacing the 
i's by p and q.) Rewriting these equations in full we get 


A , — iA 2 
A o = jA t + iA 3 
A 3 — 2 A 2 "t* ^A 4 


( 6 ) 


^M+L -2 — \A 

A\i+i.~ i M+L 2 +i 

Wc now introduce our generating function 


(jr) = A l x+A i x i + . .. (7) 

To use this function, we multiply the first equation in (6) by x. 
the second equation by .v 2 and so on, and then add to give 

si(x) = }.v</t,.v + /l,jr 2 + ... + / 4 . m+ ,._ 2 x w+ '- 2 ) 

+ if 1 (A s .v 2 + A 3 x , + ... + A u+t-) 

_l_ 4 r w+/.-1 

= £x(j/— /4 w+/ ._ 1 jr w+/_1 ) + ix~'(.stf—A t x) + ir w *'~' 
or rearranging 

.*U)(1-**-**->) =kx >t +‘-'-lA^x»+'—kA x . (8) 

In the right-hand side we now need to calculate the unknowns 
A, and/4 i,. ; We can easily derive one relation between them 
byputting.v= I into equation (8) giving 


0 — i-iA.M-hi.-i iA |. 


(9) 
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The second relation that is needed can be obtained by differen¬ 
tiating ( 8 ) with respect to x 

dx 

= i(M + L-\ )x‘ ,T/ -* - i< M+ L)A r w+/ - -1 


and again puttings = 1 gives 


w+;- i ' M + L 

Using equation (9) we easily find that 


A ' M + L 

Substituting these values into equation ( 8 ) and doing some 
algebraic manipulation we have 


sS(x) = 


x( 1 -x' ,+ ') 
(M+ L)( \ —xY 


1 — jf* 


Expanding the denominators by the binomial theorem we find 
that all powers of x higher than (M + L— I) cancel and we are 
left with our final result that 


.I/ + /.-1 I, 

.£/(*)= ^ w+l?' 

A« I 

Comparing equations (7) and (10) we see that A,, = k/(M + L) 
and in particular for the original game, when A has M pennies, 

probability of A winning = MI(M + L) 
and similarly 

probability of B winning = LI (M 4- L). 
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We deduce from these results that a player’s chance of winning 
is proportional to his initial capital and, furthermore, that the 
game is sure to end since the sum of these probabilities is unity. 


Length of game 

The computation of the average length of the game follows 
very similar lines to the above. We first define 

T k = expected length of the game if A holds k pennies 

in the hope that we can eventually calculate T Si . We can see 
immediately that T {) = Tu+ L = 0 since in either case one of the 
players is bankrupt, the game is over and hence takes zero time 
to complete. The basic equations for the calculation of the 
T k are 

T k = i( 1 + T k - X ) + £( 1 + T k + X ) 

where we remember that 7„ “ Tm+ l = 0. It is left to the reader 
to convince himself that these equations are correct, to con¬ 
struct the solution in the above manner and to deduce that 
T k = k(M + L — k), giving in particular T. tf = LM. 

All the results obtained so far have been intuitively reason¬ 
able but this last result gives our intuition a surprise. For, 
consider the case M —* x, that is A starts with infinite capital 
while B starts with L pennies. Our results now tell us that the 
probability of A winning tends to unity or in other words A is 
sure to win eventually; this is a very reasonable result. On the 
other hand, we see that the average length of the game tends to 
infinity, which is time much longer than expected. 

Even for a specific example this time appears long; for in¬ 
stance, take M = 10 and L = 5. Our results tell us that the 
probability of A winning is 3 and of B winning is i and that on 
average the game will last the surprisingly long time of fifty 
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tosses. With these particular values, ten games were played 
and are illustrated in Fig. 6.3. A method of construction of 
these games was used which is certainly much quicker than 



actually tossing a coin. Tables of random digits were taken and 
starting at any point the sequence of digits was followed, taking 
an even digit as a win for A and an odd digit as a win for B. 


Diffusion 

Gambling problems of the type discussed have been used 
extensively as models to describe diffusion phenomena. For 
instance, Einstein used this type of analysis in a study of the 
diffusion of colloids or what is usually called Brownian motion. 
The motion is caused by the large colloidal particles under¬ 
going small, random displacements caused by collisions with 
the much smaller solution molecules. Einstein described the 
obvious analogy between such motion and the tossing-penny 
game (the game we have described corresponds to one¬ 
dimensional colloidal motions), and showed that the basic 
equations of our game reduced, as the step length and the time 
for a step both tend zero, to the classical diffusion equation 
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which was known to govern the large scale diffusion. Thus 
using the comparatively simple mathematics of the model pro¬ 
posed, he was able to predict the large-scale properties of the 
system and deduce such quantities as the viscosity and the 
diffusion coefficient in terms of the microscopic data of the 
molecules concerned. 

The two gamblers with their finite amount of money in these 
applications represent the walls of the container enclosing the 
system. The walls in this case would be absorbing walls but 
reflecting walls can be studied analagously. 


Recurrence paradox 

A further example of the use of such models of gambling is in 
the explanation of the “recurrence paradox”. A description of 
the paradox can be given by considering the following example. 
Suppose a drop of ink is placed in a beaker of water, the ink 
will diffuse into the water until we have a solution which 
appears uniform. The question that is then asked is whether at 
some future time the drop of ink will reappear again. This is a 
reasonable question since Poincare has proved a general 
theorem which states that almost every state of such a system 
will reproduce itself as closely as we like after a finite time. 
This very perturbing paradox is certainly in contradiction to 
experience and intuition. 

The problem was finally cleared up by the Ehrenfests (1907) 
who proposed a model which exhibited most of features 
required. The model consisted of two bags A and B containing 
between them N balls, the balls being numbered consecutively 
from I to N. A number between 1 and N is chosen at random; 
if the corresponding ball is in bag A it is removed and placed 
in bag B and vice versa. This model is very similar to the 
tossing-pennies game, the difference being that the probability 
of a win or loss at each stage is not i but depends on the profit 
at that stage of the game. The solution of the mathematical 
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problem posed by the model is naturally more complicated 
than the case we considered, but it can be obtained by basically 
the same generating function method described above. The 
additional difficulty is concerned with extracting the relevant 
information from the generating function once it has been 
obtained. 

Imagining the heat content of each bag to be proportional 
to the number of balls it contains, we see that in general heat 
will flow from the hot bag to the cold bag. It can be shown 
further that such a cooling follows Newton's law of cooling. 
The equilibrium that results, however, is not a static but a 
dynamic equilibrium. That is. in such a state, although we 
have N a = N H we get fluctuations around these equilibrium 
values and some of the fluctuations in fact can be very large. 
For instance, suppose we start with bag A containing 2000 
balls and bag B no balls. The cooling will take place until 
there are about 1000 balls in each bag. It is clear, however, 
that there is a finite probability that our original configuration 
will occur again. The mathematics of this model leads us to 
the result, for this particular case, that the average time before 
recurrence will be 2 4noo r, where r is the time for each ball 
transference. Taking this time r to have the reasonable value 
of I0 -9 seconds our recurrence time becomes about 10 IKO 
years. The analogy with the drop of ink and the resolution of 
the paradox becomes obvious. This model has been used 
further to illustrate other features of reversible and irreversible 
thermodynamics. 

We see again that a simple microscopic model, basically our 
tossing-penny game, can be used to analyse difficult, real 
problems, producing useful ideas and results concerning the 
large-scale behaviour of such systems. 
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CHAPTER 7 


DIFFERENTIAL EQUATIONS 

J. S. Griffith 


1. Introduction 

1 want to introduce you to certain methods in the theory of 
differential equations, which are of considerable interest and 
importance but which will probably not be met by you in your 
classes at school. 1 shall be speaking mainly to those who 
have already met the idea of a differential equation and some 
of the methods which are used to solve them. 1 hope, however, 
that the more expert ones amongst you will bear with me while 
I say a few introductory words about what a differential 
equation is and how we proceed when we try to solve it. 

2. Integration as the reverse of differentiation 

If we are given a function Y = Y(x) of the variable jr, we 
know that the differential coefficient at a point is defined 
geometrically as the tangent of the angle of slope of the 
curve at that point (see Fig. 7.1). Equivalently, it may be 
defined as a limit and so we have 


— = tan 0 = lim 
ax &r-o 


KU + &r)-FU) 
8x 


( 1 ) 
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provided that the function Y (jc) is sufficiently "well-behaved” 
for the tangent and the limit to exist. From equation (1) we 
see that, when 8x is small, 


KU+&r) « TM+^&r. (2) 

ax 

On the other hand, suppose we are given some other function 
y(.r). Then we know that, if y(x) is always positive, the integral 
of v(jt) is equal to the area under the curve. Referring to Fig. 
7.2 we find that 

X 

J y(jt) dx = shaded area = Y( jc), say, (3) 


where we shall relate this new F(x) to our previous one in a 
moment. But, by looking at the areas in Fig. 7.2, it is clear that 

jr+bx 

j y(x) dx = y(jr)+y(x) 8x (4) 

a 

because the extra area is nearly equal to that of a rectangle with 
sides y(x) and &r. However, from the definition in equation 
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(3), the integral in equation (4) is also equal to K(jc-f-&r). 
Therefore, 

T(jr-rSA-) * Y{x) + y(x)8x (5) 

which, by comparison with equation (2), shows that 


dY 

— = y(x). 
dx 


( 6 ) 


We have proved that integration and differentiation are 
inverse processes because we have shown that integration 
turns y( jc) into Y(x) and that differentiation turns Y(x) back 
into y( jc) again. As both integration and differentiation are 
defined separately and independently, this result is not 
obvious but has the status of an important theorem. 

It is more usual to show the dependence of an integral upon 
both of its limits a and x, for example by putting 


Y(x) = j y (jc) dx 
i> 


f y(x) dx= Y(x) — Y(a), 


so that 
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but with this definition too the equation 



holds. 

The equation dY/dx = y(x), where y(x) is a known function 
of x but Y is unknown, is a differential equation. We have seen 
that the integral 

Y = fy(x)dx 

a 

gives a solution to it. 


3. General differential equations 

Any equation for Y(x) involving its differential coefficients 
and other, known, functions is a differential equation for Y(x). 
For example: 


d 2 Y dY 
^ r +3- r +2Y = 0, 
dx 2 dx 



+ x 2 = 0, 


(PY 
dx 2 



+ 2 sin Y = 0 


(7) 

( 8 ) 

(9) 


are differential equations. Equation (7) is called a linear 
equation, because Y occurs at most once in each term, and 
(8) and (9) are non-linear equations. The vast majority of 
differential equations you will meet at school and at the 
university will be linear, although in research work one often 
meets non-linear equations. Linear equations tire generally 
much easier to handle than non-linear ones and many of the 
general methods of solving differential equations are only 
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actually applicable to linear equations. In the latter part 
of this chapter, however, I shall describe one of the few general 
methods which can also be applied to non-linear equations. 

First, however, let us note that if we are given a differential 
equation we can ask three equally important but rather 
different kinds of question about it. 

(i) Does it possess any solutions in terms of elementary 
functions? 

This means, is any function like.r*, sin.v, log x, etc., a solution 
of it? 

Equation (7), for example, has solutions of the kind e" x . We 
see this by trial. 

If Y = e ux , then — ae" x and —rr = tPe ax , 
dx dx 2 

so 

a 2 e’ ,x -I- 3 ae" x -F 2e ,,x = 0. 

Divide by the non-negative function e ax and we get 

a 2 + 3a + 2 = 0 
(<H-l)(a + 2) =0 
a = — l or —2. 

Hence either Y = e~ x or Y = e~ 2x is a solution. In fact it is 
possible to prove that the only solutions are Y = Ae~ x + Be~ 2x , 
where A and B are any constants. 

We learn in our classes that equations like (7) have these 
exponential solutions, but such solutions are obtained originally 
by guesswork. They are not by any means to be despised for 
this, however, as once we possess them we may usually easily 
answer any other question about the solution Y(x) in which 


H 
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we may be interested. We must normally always look for an 
elementary solution first and only if this fails go on to other 
methods. 

(ii) Existence theorems 

These are designed to prove, without necessarily solving 
the equation, whether it possesses any solutions and, if so, how 
many. As a simple but not very typical example, we can see 
that equation (8) has no solutions because when x =t= 0 the 
left-hand side must be greater than 0, no matter what the value 
of the real function dY/dx. Another theorem in this category 
is that equations like equation (7) always possess just two 
independent solutions, although we shall not prove this. 


(iii) What are the general properties of the solutions? 

If question (i) is satisfactorily answered this is usually easy. 
For example, how do the solutions of equation (7) behave as 
x —*■ »? Any solution can be written as Y = Ae~ s + Be~ tr . Now 
e~ x and e~ 2x —*■ 0 as x —* =*, and so therefore does Y. But 
suppose we do not have any simple solutions, but only the 
equation? In this case it is still sometimes possible to determine 
their behaviour by arguing directly from the equation. 

As an example, consider the equation 



where, for simplicity, we shall suppose y > 0 when t = 0 and 
shall investigate the behaviour of y as t increases. We cannot 
solve the equation in elementary functions but. in spite of this, 
we can show that as t increases from zero, y never decreases 
and is bounded above and below. 
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We do this by noting that, so long as y remains positive, 
dy/dt satisfies the inequality 

0 « _ 

dt (l-t-tV 

The first part of this inequality shows that 

I 

y(t)-y( 0) =J&dt 5* 0 
0 

and therefore y(t) s* y(0) > 0 for all t. So y is always positive 
and therefore the inequality 


dl ^ _ 1 _ 

dt (1 + r) 4 

is always true. We now rearrange this to put the part referring 
to y on one side of the inequality and that referring to t on 
the other, and then integrate between 0 and t. Hence 

iKO I 

j\ ± 

J y J (i + tr 

> 10 ) 0 


which gives 

log.v(0-log.v(0) 

The logarithms are to base e. so on taking antilogs we find 
y(t) « y(0)e m -''*' + w « y(0)e ,/3 . 

Thus we have shown that y(t) always lies between y(0) and 
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y(0)e 1 ' 3 . Because dyldt 5* 0,y(/) never decreases. Our findings 
are illustrated in Fig. 7.3. 


4. Graphical methods of obtaining properties of solutions 

We shall take as our main example the well-known problem 
of the simple pendulum. Let the mass of the bob be m and the 
length from the point of attachment be a. This is illustrated in 
Fig. 7.4, where the pendulum is inclined at an angle 0 to the 



mg 

Fig. 7.4. 
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vertical. We now derive the equation of motion by resolving 
at right angles to the pendulum. This gives us 

—mg sin 0 = ma—^. (11) 

Equation (11) is often treated by assuming that the amplitude 
of the motion is very small so that at all points of the motion 
0 is small enough for us to replace sin 0 by 0 without significant 
error. Then in place of equation (11) we have 

—mgO = ma (12) 


which has the general solution 

m-A~ (&)+•+(£). 

The constants A and B are determined from the initial inclina¬ 
tion, which is 8(0)= A, and the initial velocity, which is 
(a ddldt) 0 = B(ag) 112 . Sines and cosines each repeat their 
values after 27r and so the pendulum oscillates with period 
T, where T(g/a) m = 2i r. Hence T = 2ir(alg) 112 . 

This is all very well, but what happens if the amplitude is 
not small? We may easily be interested in motions in which 
the pendulum becomes horizontal or higher. Indeed we may 
ask what is the motion if we give the pendulum such a large 
initial velocity that it swings right over the top? In these cases, 
equation (11) is still valid but it can no longer be replaced 
even approximately by equation (12). 

If we wish to establish the general properties of the solutions 
of equation (II) we can use a rather remarkable graphical 
method. We work in terms of 0 and a new variable v which is 
defined as y = dd/dt. In order to define the state of motion at 
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any time we need only to know the position and velocity at that 
time. Hence if we plot the values of 0 and y in the fly-plane we 
shall be able to say that there is a unique correspondence 
between each point of the plane (which gives the values of 
0 and v = dOldi) and a state of motion of the pendulum (i.e. its 
position and velocity). As time goes on. so 0 and y change 
and hence the representative point follows a trajectory in the 
0v-plane. The Wv-plane is known as the phase plane. 

Equation (11) may be re-expressed in terms of 0 and y to give 
us the pair of equations 


d» 

di 


y. 


dx 

dt 


— - sin 0. 
a 


(13) 


These equations enable us to trace the trajectory in the phase 
plane. Suppose we start at a point (0.y). Then after a small 
time 8t. 0 will have changed to 

0 + ~-8t and y to y+-j-8t 
dt ■ ■ dt 

as we saw for Y in equation (2). This is illustrated in Fig. 7.5(a). 
The direction in which the point moves in the phase plane is 
determined by the angle in that right-angled triangle and hence 
by the ratio of d0ldt to dyldt. We can then put in the values for 
these differential coefficients, as given in equation (13). Thus 




>> 




sin0 
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we see that the direction of motion of the point in the phase 
plane depends simply upon where the point is and not on any¬ 
thing else. This is shown in Fig. 7.5(b), where the arrow shows 
the direction of motion starting from the point (6,y). 

Using Fig. 7.5(b), we can plot the arrows at any point in the 
( 6,y ) plane. This is done in Fig. 7.6 for values of tt ranging from 
7r to 77*. Evidently the centre of the plane looks rather like a 
whirlpool with everything rotating about the stagnant point at 
6 = y = 0. A question which arises immediately is this: sup¬ 
pose we start at some point in the central region of the plane 
and follow the trajectory around, do we get back to our starting 
point after a revolution, or not? If we do get back to the starting 
point then the trajectory repeats itself indefinitely as shown by 
the closed dotted curves in Fig. 7.6. In particular, the value of 
6 goes continually between two limits which are the same each 



6 =-* 


0=0 
Fig. 7.6. 


0=tt 
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time. In other words the pendulum oscillates with a constant 
amplitude. Experience with reasonably frictionless pendulums 
suggests that this result is true, but how do we prove it? 

We do so by a method which is called “finding an integral of 
the motion” although it is also closely connected with the law 
of conservation of energy. Since 

d 2 0 , . a n 

sin 0 = 0, 

we can multiply through by dOldt and integrate. Thus we have 
first 


a 


d 2 0 d0 
dt 2 dr 


, * g. dO n 

+ tfsinfl.-^- = 0. 


and then 


~gCOS0 = C 


(constant). 


which is the same as 


iay 2 — g cos 0 = C. (14) 

When 0 is so small that we can replace cos 0 by 1 — W, this 
shows that the trajectories satisfy the equation 

taf + tgff i = C+g. (15) 

Hence they are ellipses when 0 is small. 

When 0 is not necessarily small we note that C is constant 
along any trajectory. Hence if we follow the motion around the 
centre, the value of C after one revolution is still the same. So 
the point which the trajectory has now reached must have the 
same value of C as the point from which it started. Let us. 
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then, start on the positive y-axis. Then y > 0 and 0 = 0, so 
C = iay 2 — g. Hence the values of C are never repeated, which 
shows that any trajectory which goes round the centre must 
arrive back at identically the same point on the positive y-axis. 
Which trajectories go round? Clearly those which meet the 
line OA in Fig. 7.6. rather than the line AB. The value of C on 
the line OA (where y = 0) is — g cos 0 which lies between — g 
and g. On the line AB (where 0= n), C is iay^ + g, which is 
never less than g. So trajectories for which C < g go round the 
centre and are closed. Those for which C > g cross the line 
AB and 0 increases continually. There are also others with 
C > g in the lower half of the plane, and for these 0 decreases 
continually. Physically the pendulum swings over the top when 
C > g but not when C < g. 

So far we have neglected friction. The equation of motion, 
including friction, might be 

d 2 0 . . .do 

ma-j^ = -mg sin 0 - (16) 


although the method we are about to use would often still be 
valid even if the frictional force showed a more complicated 
dependence on dOldt and perhaps on 0 also. From equation 
(16) we get the equations 




--sin 0— 
a 



(17) 


for the direction of motion along the trajectories in the phase 
plane. By drawing a diagram as in Fig. 7.5(b) we easily see that 
this means that in the upper half of the phase plane, the arrows 
point more downwards than before. In the lower half, they 
point more upwards. As a consequence, at all points (except 
where y = 0) the arrows point inwards across the closed curves 
which we found in the frictionless case and plotted in Fig. 7.6. 
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Hence, all trajectories move indefinitely inwards towards the 
point 0 = y = 0 , which is the position at rest for the pendulum. 



Fig. 7.7. 

This is shown in Fig. 7.7. Thus we have shown that the pendu¬ 
lum now swings with ever-decreasing amplitude. 


5. General analysis in the phase plane 

Equations (13) are rather special, in that because they were 
derived from equation (11) by setting y = dt)ldt we must 
obviously have ddldt = y. However, the graphical method can 
equally well be applied to equations of the kind 

~ = P(x,y). ~t, — Q (*0')» (18) 

dt dt 

where neither x nor y is necessarily the differential coefficient 
of the other one. If we are given equations like (18) our method 
of analysis proceeds by first asking which are the points for 
which dx/dt = dyldt = 0. Evidently the trajectory from such 
a point degenerates because it never moves away from the 
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point. Such points therefore correspond to stationary values of 
x and y. They are called singular points. 

Next we investigate the behaviour of the trajectories in the 
neighbourhood of the singular points. There are many pos¬ 
sibilities; three of the simplest are illustrated in Fig. 7.8. We 
have met each of these in our discussion of the pendulum. The 
point O in Fig. 7.6 is a centre, but when we include friction it 



becomes a focal point as illustrated in Fig. 7.7. In Figs. 7.7 and 
7.8(c) the trajectories move in towards the singular point, 
which is called therefore a stable focal point. It is also possible 
for them to move outwards, as we see in Fig. 7.9. and such a 
singular point is called an unstable focal point. Finally, points 
A and D in Fig. 7.6 are saddle points. 



Fic. 7.9. 
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We cannot go any further into general methods of analysis, 
except to mention one very interesting kind of trajectory for 
which one looks. The singular points represent a very degen¬ 
erate form of repeating, or periodic, motion simply because the 
values of x and y never change. A more interesting kind of 
trajectory representing a periodic motion is the limit cycle. 
This is a closed trajectory in the phase plane such that either all 
neighbouring trajectories move towards it (stable limit cycle) 
or all move away (unstable limit cycle). A stable limit cycle is 
shown in Fig. 7.9. 

An example of equations which give the situation illustrated 
in Fig. 7.9. but with a circle for the limit cycle, is 


(1 -jc 2 —y 2 ). 


dx _ x 

dt ' V(jr 2 + y 2 ) 

& = -r + _£- 

dt Vfjr' + y 2 ) 


(19) 


The singular points are at dxjdt = dyldt = 0 and can therefore 
only occur if 



— x^ = y 2 + x 2 = 0 


and hence only at x = y = 0. On the other hand if we consider 
the way in which the square, r 2 = jr 2 + y 2 . of the distance from 
the origin varies along a trajectory, we find 



2r 2 

V(jr 2 +y 2 ) 


(1 — jc 2 —y 2 ) + 


2/ 

V(jr 2 + y*) 


(I—jr 2 —y 2 ) 


= 2r( 1 —r 2 ). 


But 
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dr 

dt 


and therefore (for r 4= 0) 



— r 2 . 


( 20 ) 


Hence if r= 1, dr/dt = 0, and the circle r— 1 is a closed 
trajectory in the jry-plane. On the other hand, equation (20) 
shows that r decreases if it is initially greater than 1 and 
increases if it lies between 0 and I. Therefore the circle r = 1 is 
a stable limit circle. 
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